<a href="https://colab.research.google.com/github/lovnishverma/Python-Getting-Started/blob/main/%F0%9F%90%BC_Python_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**🐼 Pandas  Tutorial for Beginners**   -  ® Lovnish Verma

Pandas is a powerful Python library for data analysis and manipulation. It provides two main data structures:



*   **Series** – A one-dimensional labeled array
*   **DataFrame** – A two-dimensional labeled table

Let’s get started with Pandas basics, covering data structures, essential operations, and real-world examples.

1️⃣ Installation
If you haven't installed Pandas yet, use:



```
pip install pandas
```



Or if you're using Google Colab or Jupyter Notebook:

In [119]:
!pip install pandas



2️⃣ Importing Pandas

In [120]:
import pandas as pd

3️⃣ Pandas Series (1D Data Structure)
A Series is like a column in Excel, consisting of data and an index.

In [121]:
#Creating a Series
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

0    10
1    20
2    30
3    40
dtype: int64


Custom Index in a Series

In [122]:
series = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(series)

a    10
b    20
c    30
dtype: int64


Accessing Series Elements

In [123]:
print(series['b'])  # Output: 20

20


4️⃣ Pandas DataFrame (2D Data Structure)
A DataFrame is like an Excel spreadsheet with rows and columns.

Creating a DataFrame

In [124]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


Python Code to Generate **data.csv**

In [125]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
    'Age': [25, 30, 35, 40, 28, 33, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco', 'Miami', 'Houston', 'Boston'],
    'Salary': [50000, 60000, 70000, 80000, 55000, 62000, 58000]
}

df = pd.DataFrame(data)

# Save to CSV
df.to_csv('data.csv', index=False)

print("data.csv file created successfully!")


data.csv file created successfully!


5️⃣ Reading & Writing Data

In [126]:
#Reading CSV Files
df = pd.read_csv('data.csv')

Writing to a CSV File

In [127]:
df.to_csv('output.csv', index=False)

6️⃣ Data Exploration

Checking First & Last Rows

In [128]:
print(df.head())  # First 5 rows
print(df.tail())  # Last 5 rows

      Name  Age           City  Salary
0    Alice   25       New York   50000
1      Bob   30    Los Angeles   60000
2  Charlie   35        Chicago   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
      Name  Age           City  Salary
2  Charlie   35        Chicago   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
5    Frank   33        Houston   62000
6    Grace   29         Boston   58000


Basic Info & Statistics

In [129]:
print(df.info())       # Data types & missing values
print(df.describe())   # Summary statistics

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    7 non-null      object
 1   Age     7 non-null      int64 
 2   City    7 non-null      object
 3   Salary  7 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 356.0+ bytes
None
             Age        Salary
count   7.000000      7.000000
mean   31.428571  62142.857143
std     4.995236  10007.140308
min    25.000000  50000.000000
25%    28.500000  56500.000000
50%    30.000000  60000.000000
75%    34.000000  66000.000000
max    40.000000  80000.000000


7️⃣ Selecting & Filtering Data

Selecting Columns

In [130]:
print(df['Name'])  # Selecting a single column
print(df[['Name', 'Age']])  # Selecting multiple columns

0      Alice
1        Bob
2    Charlie
3      David
4        Eve
5      Frank
6      Grace
Name: Name, dtype: object
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   40
4      Eve   28
5    Frank   33
6    Grace   29


Selecting Rows (Using Indexing)

In [131]:
print(df.iloc[0])   # First row
print(df.loc[1])    # Row with index 1

Name         Alice
Age             25
City      New York
Salary       50000
Name: 0, dtype: object
Name              Bob
Age                30
City      Los Angeles
Salary          60000
Name: 1, dtype: object


Filtering Data

In [132]:
filtered_df = df[df['Age'] > 25]
print(filtered_df)

      Name  Age           City  Salary
1      Bob   30    Los Angeles   60000
2  Charlie   35        Chicago   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
5    Frank   33        Houston   62000
6    Grace   29         Boston   58000


8️⃣ Modifying Data

Adding a New Column

In [133]:
df['Salary'] = [50000, 60000, 70000, 80000, 55000, 62000, 58000]
print(df)

      Name  Age           City  Salary
0    Alice   25       New York   50000
1      Bob   30    Los Angeles   60000
2  Charlie   35        Chicago   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
5    Frank   33        Houston   62000
6    Grace   29         Boston   58000


Updating Values

In [134]:
df.loc[1, 'Age'] = 32  # Update age of Bob

Deleting a Column

In [135]:
df.drop(columns=['Salary'], inplace=True)

9️⃣ Handling Missing Data

In [136]:
#Detect Missing Values
print(df.isnull().sum())  # Count missing values in each column

Name    0
Age     0
City    0
dtype: int64


In [137]:
#Fill Missing Values
df.fillna(0, inplace=True)

In [138]:
#Remove Missing Values
df.dropna(inplace=True)

Adding a new collumn

In [139]:
df['Salary'] = [50000, 60000, 70000, 80000, 55000, 62000, 58000]
print(df)

      Name  Age           City  Salary
0    Alice   25       New York   50000
1      Bob   32    Los Angeles   60000
2  Charlie   35        Chicago   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
5    Frank   33        Houston   62000
6    Grace   29         Boston   58000


🔟 Grouping & Aggregation

In [140]:
#Grouping Data
grouped_df = df.groupby('City')[['Age', 'Salary']].mean()  # Select numeric columns only
print(grouped_df)

                Age   Salary
City                        
Boston         29.0  58000.0
Chicago        35.0  70000.0
Houston        33.0  62000.0
Los Angeles    32.0  60000.0
Miami          28.0  55000.0
New York       25.0  50000.0
San Francisco  40.0  80000.0


In [141]:
#Aggregating Data
print(df['Age'].sum())    # Total age
print(df['Age'].mean())   # Average age
print(df['Age'].max())    # Maximum age

222
31.714285714285715
40


🔹 Merging & Joining DataFrames

In [142]:
#Concatenating DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

df_concat = pd.concat([df1, df2])
print(df_concat)

   A  B
0  1  3
1  2  4
0  5  7
1  6  8


Merging DataFrames

In [143]:
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Salary': [50000, 60000, 70000]})

df_merged = pd.merge(df1, df2, on='ID', how='inner')  # inner, left, right, outer
print(df_merged)


   ID   Name  Salary
0   1  Alice   50000
1   2    Bob   60000


Python Code to Create **employees.csv** file

In [144]:
import pandas as pd

# Create sample employee data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Hannah', 'Ian', 'Jack'],
    'Age': [25, 30, 35, 40, 28, 33, 29, 32, 27, 31],
    'City': ['New York', 'Los Angeles', 'New York', 'San Francisco', 'Miami',
             'Houston', 'Boston', 'New York', 'Chicago', 'Seattle'],
    'Salary': [50000, 60000, 70000, 80000, 55000, 62000, 58000, 75000, 67000, 72000]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Save to CSV file
df.to_csv('employees.csv', index=False)

print("employees.csv file created successfully!")


employees.csv file created successfully!


📌 Real-World Example
Analyzing a CSV File



In [145]:
df = pd.read_csv('employees.csv')

# Show top rows
print(df.head())

# Find average salary
print(df['Salary'].mean())

# Find employees from New York
ny_employees = df[df['City'] == 'New York']
print(ny_employees)

# Save to a new CSV file
ny_employees.to_csv('ny_employees.csv', index=False)

      Name  Age           City  Salary
0    Alice   25       New York   50000
1      Bob   30    Los Angeles   60000
2  Charlie   35       New York   70000
3    David   40  San Francisco   80000
4      Eve   28          Miami   55000
64900.0
      Name  Age      City  Salary
0    Alice   25  New York   50000
2  Charlie   35  New York   70000
7   Hannah   32  New York   75000


🎯 **Summary**

✅ **Pandas Series** – 1D labeled array

✅ **Pandas DataFrame** – 2D table-like structure

✅ **Data Manipulation** – Selecting, filtering, updating

✅ **Handling Missing Data** – Filling/removing missing values

✅ **Aggregation & Grouping** – Analyzing large datasets

✅ **Reading/Writing Data** – Import/export CSV files

*Made with ❤️ by Lovnish Verma* ❤