<h1>Introduction to Pandas for Data Analysis</h1>

<h3>Objectives</h3>
<hr>
<ol>
    <li>Learn what Pandas Series are and how to create them.</li>
    <li>Understand how to access and manipulate data within a Series.</li>
    <li>Discover the basics of creating and working with Pandas DataFrames.</li>
    <li>Learn how to access, modify, and analyze data in DataFrames.</li>
    <li>Gain insights into common DataFrame attributes and methods.</li>
</ol>

<h3>What is Pandas?</h3>
<hr>
Pandas is a popular open-source data manipulation and analysis library for the Python programming language. It provides a powerful and flexible set of tools for working with structured data, making it a fundamental tool for data scientists, analysts, and engineers.
Pandas is designed to handle data in various formats, such as tabular data, time series data, and more, making it an essential part of the data processing workflow in many industries.

Here are some key features and functionalities of Pandas:

Data Structures: Pandas offers two primary data structures - DataFrame and Series.

<li>A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).</li>
<li>A Series is a one-dimensional labeled array, essentially a single column or row of data.</li>

Data Import and Export: Pandas makes it easy to read data from various sources, including CSV files, Excel spreadsheets, SQL databases, and more. It can also export data to these formats, enabling seamless data exchange.

Data Merging and Joining: You can combine multiple DataFrames using methods like merge and join, similar to SQL operations, to create more complex datasets from different sources.

Efficient Indexing: Pandas provides efficient indexing and selection methods, allowing you to access specific rows and columns of data quickly.

Custom Data Structures: You can create custom data structures and manipulate data in ways that suit your specific needs, extending Pandas' capabilities.

In [None]:
import pandas as pd

In [None]:
import pandas as pd
# Read the CSV file into a DataFrame
df = pd.read_csv('your_file.csv')

<h3>What is a Series?</h3>
<hr>

A Series is a one-dimensional labeled array in Pandas. It can be thought of as a single column of data with labels or indices for each element. You can create a Series from various data sources, such as lists, NumPy arrays, or dictionaries
Here's a basic example of creating a Series in Pandas:

In [None]:
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

print(s[2])     # Access the element with label 2 (value 30)

print(s.iloc[3]) # Access the element at position 3 (value 40)

print(s[1:4])   # Access a range of elements by label

<h3>Series Attributes and Methods</h3><hr>

Pandas Series come with various attributes and methods to help you manipulate and analyze data effectively. Here are a few essential ones:

<li>values: Returns the Series data as a NumPy array.</li>
<li>index: Returns the index (labels) of the Series.</li>
<li>shape: Returns a tuple representing the dimensions of the Series.</li>
<li>size: Returns the number of elements in the Series.</li>
<li>mean(), sum(), min(), max(): Calculate summary statistics of the data.</li>
<li>unique(), nunique(): Get unique values or the number of unique values.</li>
<li>sort_values(), sort_index(): Sort the Series by values or index labels.</li>
<li>isnull(), notnull(): Check for missing (NaN) or non-missing values.</li>
<li>apply(): Apply a custom function to each element of the Series.</li>






<h3>What is a DataFrames?</h3><hr>

A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. Think of it as a table where each column represents a variable, and each row represents an observation or data point. DataFrames are suitable for a wide range of data, including structured data from CSV files, Excel spreadsheets, SQL databases, and more.

In [None]:
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 28],
        'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)

print(df['Name'])  # Access the 'Name' column

print(df.iloc[2])   # Access the third row by position
print(df.loc[1])    # Access the second row by label

print(df[['Name', 'Age']])  # Select specific columns
print(df[1:3])             # Select specific rows

unique_dates = df['Age'].unique() #Finding Unique Elements

high_above_102 = df[df['Age'] > 25] #Conditional Filtering

df.to_csv('trading_data.csv', index=False) #Saving DataFramesSaving DataFrames

<h3>DataFrame Attributes and Methods
</h3><hr>

DataFrames provide numerous attributes and methods for data manipulation and analysis, including:

<li>shape: Returns the dimensions (number of rows and columns) of the DataFrame.
</li>
<li>info(): Provides a summary of the DataFrame, including data types and non-null counts.
</li>
<li>describe(): Generates summary statistics for numerical columns.
</li>
<li>head(), tail(): Displays the first or last n rows of the DataFrame.
</li>
<li>mean(), sum(), min(), max(): Calculate summary statistics for columns.
</li>
<li>sort_values(): Sort the DataFrame by one or more columns.
</li>
<li>groupby(): Group data based on specific columns for aggregation.
</li>
<li>fillna(), drop(), rename(): Handle missing values, drop columns, or rename columns.
</li>
<li>apply(): Apply a function to each element, row, or column of the DataFrame.
</li>
<a href="https://pandas.pydata.org/docs/">Pandas Official website</a>


<h3>Conclusion</h3><hr>

In conclusion, mastering the use of Pandas Series and DataFrames is essential for effective data manipulation and analysis in Python. Series provide a foundation for handling one-dimensional data with labels, while DataFrames offer a versatile, table-like structure for working with two-dimensional data. Whether you're cleaning, exploring, transforming, or analyzing data, these Pandas data structures, along with their attributes and methods, empower you to efficiently and flexibly manipulate data to derive valuable insights. By incorporating Series and DataFrames into your data science toolkit, you'll be well-prepared to tackle a wide range of data-related tasks and enhance your data analysis capabilities.
To further your skills in data analysis with Pandas, consider the following next steps:

Practice:
Work with real datasets to apply what you've learned and gain hands-on experience.

Explore Documentation:
Visit the Pandas official website to explore the extensive documentation and discover more functions and methods.