Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like DataFrame and Series, making it easy to handle and analyze structured data. With functions for cleaning, filtering, and transforming data, Pandas is widely used in data science and analysis tasks.

# Why do we use pandas


*   Data Structures(Series and Dataframe Structures)
*   Ease of data cleaning
*   Data Exploration
*   Handling different data types



In [None]:
# run pip install pandas if you can't import pandas
import pandas as pd

DataFrames are two-dimensional, tabular data structures in the Pandas library. They are similar to spreadsheets or SQL tables, with rows and columns. Each column in a DataFrame can be of a different data type (e.g., integers, floats, strings), and you can perform various operations like filtering, grouping, and merging to analyze and manipulate the data efficiently. DataFrames are a key feature of Pandas, making it easier to work with and analyze structured data.

In [None]:
# creating dataframes
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [28, 24, 22],
        'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

Loading and saving data

In [None]:
# Reading from a CSV file
df_csv = pd.read_csv('example.csv')

# Saving to a CSV file
df.to_csv('new_dataframe.csv', index=False)

In Pandas, you can inspect your data using various methods. Some common ones include:

1. **head():** Displays the first few rows of the DataFrame.
   ```python
   df.head()
   ```

2. **tail():** Shows the last few rows of the DataFrame.
   ```python
   df.tail()
   ```

3. **info():** Provides information about the DataFrame, including data types and missing values.
   ```python
   df.info()
   ```

4. **describe():** Generates descriptive statistics, like mean and standard deviation, for numerical columns.
   ```python
   df.describe()
   ```

5. **shape:** Returns the number of rows and columns in the DataFrame.
   ```python
   df.shape
   ```

These functions help you get a quick overview of your dataset and understand its structure and content.

Data Selection and Indexing

In [None]:
# Selecting columns
df['Name']

# Filtering rows
df[df['Age'] > 25]

# Setting a new index
df.set_index('Name', inplace=True)

Data Cleaning

In [None]:
# Handling missing values
df.dropna()

# Removing duplicates
df.drop_duplicates()

Data Manipulation with pandas

In [None]:
# Merging DataFrames
merged_df = pd.merge(df1, df2, on='common_column')

# Grouping and aggregating
grouped_df = df.groupby('City').mean()

# Applying functions
df['Age'] = df['Age'].apply(lambda x: x + 1)

Data Visualization with pandas

In [None]:
# Plotting data
df.plot(kind='bar', x='Name', y='Age')

Understanding Categorical and Numerical  data

Pandas provides various methods to handle both categorical and numerical data:

**Handling Numerical Data:**
1. **Descriptive Statistics:** Use `describe()` to get statistical summary.
   ```python
   df.describe()
   ```

2. **Math Operations:** Perform mathematical operations on numerical columns.
   ```python
   df['numerical_column'].mean()
   ```

**Handling Categorical Data:**
1. **Value Counts:** Check the distribution of categorical values.
   ```python
   df['categorical_column'].value_counts()
   ```

2. **Label Encoding:** Convert categorical values to numerical labels.
   ```python
   from sklearn.preprocessing import LabelEncoder
   le = LabelEncoder()
   df['encoded_column'] = le.fit_transform(df['categorical_column'])
   ```

3. **One-Hot Encoding:** Create binary columns for each category (useful for machine learning models).
   ```python
   df_encoded = pd.get_dummies(df, columns=['categorical_column'])
   ```

These methods allow you to analyze, preprocess, and prepare both numerical and categorical data for various data science tasks.

Reshaping Data

In [None]:
# Melting DataFrames
df_melted = pd.melt(df, id_vars=['Name'], var_name='Attribute', value_name='Value')

# Pivoting DataFrames
df_pivoted = df.pivot(index='Name', columns='Attribute', values='Value')