<a href="https://colab.research.google.com/github/rohitjaiswalrj32/Python_Learning/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
# Pandas: Pandas is a Python library built for data manipulation and analysis.
          #It provides easy-to-use, flexible, and powerful data structures to
          #work with structured data (like CSV, Excel, SQL, JSON, etc).


# Core Data Structures:

# Series:
# A one-dimensional labeled array capable of holding any data type.
# Like a column in a spreadsheet or a single column in a DataFrame.
# Each element has an index label (can also be customized)


# DataFrame:
# Two-dimensional labeled data structure.
# Composed of multiple Series sharing the same index.


# Benifits of Pandas:

# | Feature                            | Benefit                                               |
# | ---------------------------------- | ----------------------------------------------------- |
# | Intuitive data structures          | Easy to load, filter, and manipulate data             |
# | Handles missing data               | With tools like `.isnull()`, `.fillna()`, `.dropna()` |
# | Built-in plotting (via Matplotlib) | For quick visual inspection                           |
# | Integrates with NumPy              | You can apply NumPy functions easily                  |
# | Supports multiple formats          | CSV, Excel, JSON, SQL, Parquet, etc.                  |
# | Data summarization                 | `.describe()`, `.groupby()`, `.pivot_table()`         |


# Use Cases:

# Reading and cleaning raw data
# Filtering and transforming datasets
# Exploratory Data Analysis (EDA)
# Time-series analysis
# Data aggregation and reshaping
# Exporting cleaned data


# Workflow with Pandas:

# Load the data: pd.read_csv(), pd.read_excel(), etc.
# Understand the data: .info(), .head(), .describe()
# Clean the data: handle missing values, drop duplicates
# Analyze: filtering, sorting, grouping, aggregating
# Visualize or export: plot(), to_csv(), etc.



In [6]:
# Importing Pandas Library
import pandas as pd

# Creating a Series:

# 1. From a Python list (default index 0,1,2...)
s = pd.Series([10, 20, 30, 40])
print(s)

# 2. Custom index labels
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s2)


0    10
1    20
2    30
3    40
dtype: int64
a    10
b    20
c    30
dtype: int64


In [7]:
# Practice Exercise:

# Create a Series for monthly sales: [250, 300, 150, 400] with index labels as ['Jan', 'Feb', 'Mar', 'Apr'].
# Print the Series and the sales for February.

monthly_sales = pd.Series([250, 300, 150, 400], index = ['Jan', 'Feb', 'Mar', 'Apr'])
print(monthly_sales)
print(monthly_sales['Feb'])

Jan    250
Feb    300
Mar    150
Apr    400
dtype: int64
300


In [9]:
# Basic Operations on Series

# Arithmetic operations (addition, subtraction, multiplication, division) between Series, or between Series and scalars.
# Accessing elements using .loc[] and .iloc[]
# Filtering with conditions
# Applying functions on Series (like .apply(), .map())


# .loc[] — Label-based Indexing

# Access rows and columns by their labels (names).
# The index label is used exactly as it appears.
# Works with row and column labels for DataFrames or index labels for Series.
# Can accept a single label, list of labels, or a label slice (inclusive of the end).



# .iloc[] — Position-based Indexing

# Access rows and columns by integer position (0-based index).
# Ignores the labels, purely based on position.
# Can accept an integer, list of integers, or integer slice (exclusive of the end).



#Example:
sales = pd.Series([250, 300, 150, 400], index=['Jan', 'Feb', 'Mar', 'Apr'])

# Add 50 to each month's sales
print(sales + 50)

# Access sales for March using loc and iloc
print(sales.loc['Mar'])
print(sales.iloc[2])

# Filter months where sales > 200
print(sales[sales > 200])

# Apply a discount function
discount = lambda x: x * 0.9
print(sales.apply(discount))


Jan    300
Feb    350
Mar    200
Apr    450
dtype: int64
150
150
Jan    250
Feb    300
Apr    400
dtype: int64
Jan    225.0
Feb    270.0
Mar    135.0
Apr    360.0
dtype: float64


In [23]:
#Practice Exercises

# Q1. Create a Pandas Series for daily temperatures (in °C): [22, 25, 20, 18, 24] with index labels ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'].
# Print the Series.
# Print the temperature for Wednesday.
temp = pd.Series([22, 25, 20, 18, 24] , index = ["Mon", "Tue", "Wed", "Thu", "Fri"])
print(temp)
print("Temprature on Wednesday:", temp["Wed"])
print("\n")


# Q2. Using the above Series:
# Increase all temperatures by 2 degrees and print the result.
# Find days where temperature is greater than 22°C.
# Use .apply() to convert all temperatures to Fahrenheit using the formula: F = (C * 9/5) + 32.
temp = pd.Series([22, 25, 20, 18, 24] , index = ["Mon", "Tue", "Wed", "Thu", "Fri"])
print("Original Series:")
print(temp)
print("Increase all Elements by 2:")
print(temp + 2)
print("\n")
print("Hot Days:", temp > 22)
print("\n")
print("Fahrenheit Series:", temp.apply(lambda x: (x * 9/5) + 32))
print("\n")

Mon    22
Tue    25
Wed    20
Thu    18
Fri    24
dtype: int64
Temprature on Wednesday: 20


Original Series:
Mon    22
Tue    25
Wed    20
Thu    18
Fri    24
dtype: int64
Increase all Elements by 2:
Mon    24
Tue    27
Wed    22
Thu    20
Fri    26
dtype: int64


Hot Days: Mon    False
Tue     True
Wed    False
Thu    False
Fri     True
dtype: bool


Fahrenheit Series: Mon    71.6
Tue    77.0
Wed    68.0
Thu    64.4
Fri    75.2
dtype: float64




In [None]:
# DataFrame: A DataFrame is a 2-dimensional labeled data structure with columns of
            #potentially different types (like a spreadsheet or SQL table).

# Key Features:

# Tabular data with rows and columns
# Each column can have a different data type (int, float, string, etc.)
# Row and column labels (indices)
# Supports easy data manipulation, selection, filtering, and aggregation


# Creating a DataFrame: We can create DataFrames using -

# Dictionaries
# Lists
# NumPy arrays
# CSV files
# Excel files and more.


# Basic Operations:

# Access columns as attributes or by keys
# Access rows using .loc[] (label-based) or .iloc[] (integer position-based)
# Filtering rows with conditions
# Adding/deleting columns
# Summary statistics like .mean(), .sum(), .describe()

# Why use DataFrames?
# Essential for handling real-world tabular data in data analysis, cleaning, and preprocessing workflows.

