<a href="https://colab.research.google.com/github/mohankishoregorle/pandas/blob/main/mohan_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Creating Series and DataFrames**

In [None]:
import pandas as pd

# Creating a Series
data_series = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print("Series:\n", data_series)

# Creating a DataFrame from a dictionary
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data_dict)
print("\nDataFrame:\n", df)

Series:
 a    10
b    20
c    30
d    40
e    50
dtype: int64

DataFrame:
       Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000
3    David   40   80000


**2. Data Handling with Pandas**

We’ll create a DataFrame from different data structures, handle missing values, and perform various data transformations.

Creating DataFrames from Lists and Dictionaries

In [None]:
import pandas as pd

# Creating DataFrame from a list of dictionaries
data_list_of_dicts = [
    {'Name': 'Eve', 'Age': 45, 'Salary': 90000},
    {'Name': 'Frank', 'Age': 50, 'Salary': 100000},
    {'Name': 'Grace', 'Age': 29, 'Salary': 55000}
]
df_list_of_dicts = pd.DataFrame(data_list_of_dicts)
print("\nDataFrame from list of dictionaries:\n", df_list_of_dicts)

# Creating DataFrame from a dictionary of lists
data_dict_of_lists = {
    'Name': ['Hannah', 'Isaac', 'Judy'],
    'Age': [27, 33, 29],
    'Salary': [65000, 70000, 60000]
}
df_dict_of_lists = pd.DataFrame(data_dict_of_lists)
print("\nDataFrame from dictionary of lists:\n", df_dict_of_lists)



DataFrame from list of dictionaries:
     Name  Age  Salary
0    Eve   45   90000
1  Frank   50  100000
2  Grace   29   55000

DataFrame from dictionary of lists:
      Name  Age  Salary
0  Hannah   27   65000
1   Isaac   33   70000
2    Judy   29   60000


**Handling Missing Data**

In [None]:
import pandas as pd
import numpy as np
# Introduce missing data
df_missing = pd.DataFrame({
    'Name': ['Kelly', 'Leo', 'Mia'],
    'Age': [np.nan, 28, 32],
    'Salary': [np.nan, 72000, 65000]
})
print("\nDataFrame with missing data:\n", df_missing)

# Drop rows with missing values
df_cleaned = df_missing.dropna()
print("\nDataFrame after dropping rows with missing data:\n", df_cleaned)

# Fill missing values
df_missing['Age'].fillna(df_missing['Age'].mean(), inplace=True)  # Fill NaN in 'Age' with the mean age
df_missing['Salary'].fillna(df_missing['Salary'].mean(), inplace=True)  # Fill NaN in 'Salary' with the mean salary
print("\nDataFrame after filling missing data:\n", df_missing)


DataFrame with missing data:
     Name   Age   Salary
0  Kelly   NaN      NaN
1    Leo  28.0  72000.0
2    Mia  32.0  65000.0

DataFrame after dropping rows with missing data:
   Name   Age   Salary
1  Leo  28.0  72000.0
2  Mia  32.0  65000.0

DataFrame after filling missing data:
     Name   Age   Salary
0  Kelly  30.0  68500.0
1    Leo  28.0  72000.0
2    Mia  32.0  65000.0


**Data Type Conversions**

In [None]:
import pandas as pd
import numpy as np
# Introduce missing data
df_missing = pd.DataFrame({
    'Name': ['Kelly', 'Leo', 'Mia'],
    'Age': [25, 28, 32],
    'Salary': [np.nan, 72000, 65000]
})
# Convert 'Age' column to integer type
df_missing['Age'] = df_missing['Age'].astype(float)
print("\nDataFrame with 'Age' as integer:\n", df_missing)


DataFrame with 'Age' as integer:
     Name   Age   Salary
0  Kelly  25.0      NaN
1    Leo  28.0  72000.0
2    Mia  32.0  65000.0


**3. Data Analysis with Pandas**

We will perform summary statistics, grouping, and merging operations.

**Summary Statistics**

In [None]:
import pandas as pd

# Creating a DataFrame
data = {
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000]
}
df = pd.DataFrame(data)

# Summary statistics
print("\nSummary Statistics:\n", df.describe())


Summary Statistics:
              Age        Salary
count   5.000000      5.000000
mean   35.000000  70000.000000
std     7.905694  15811.388301
min    25.000000  50000.000000
25%    30.000000  60000.000000
50%    35.000000  70000.000000
75%    40.000000  80000.000000
max    45.000000  90000.000000


**Grouping and Aggregation**

In [None]:
import pandas as pd

# Creating a DataFrame
data = {
    'Department': ['HR', 'IT', 'IT', 'HR', 'Finance'],
    'Salary': [50000, 60000, 65000, 55000, 70000]
}
df = pd.DataFrame(data)

# Group by 'Department' and calculate the mean salary
grouped = df.groupby('Department').mean()
print("\nGrouped Data and Aggregates:\n", grouped)


Grouped Data and Aggregates:
              Salary
Department         
Finance     70000.0
HR          52500.0
IT          62500.0


In [None]:
import pandas as pd

# Creating DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Aneel', 'Bharath', 'Charan']
})

df2 = pd.DataFrame({
    'ID': [4, 5],
    'Name': ['mohan', 'lohith']
})
# Concatenating DataFrames
concatenated_df = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated DataFrame:\n", concatenated_df)


Concatenated DataFrame:
    ID     Name
0   1    Aneel
1   2  Bharath
2   3   Charan
3   4    mohan
4   5   lohith
