<img src="LaeCodes.png" 
     align="center" 
     width="100" />

**Introduction to Pandas** <br>

Pandas is a flexible open-source data analysis and manipulation library for Python. It is widely used in data science for working with structured data due to its easy-to-use data structures and robust functionality. Pandas provides tools for reading and writing data, cleaning and preparing data, and performing complex data analysis.

**Key Features of Pandas** <br>
1. **Data Structures:** <br>
o **Series:** A one-dimensional labeled array capable of holding any data type.

In [2]:
import pandas as pd
s = pd.Series([1, 2, 3, 4, 5])

**Attributes:** <br>
• **values:** Returns the values in the series. <br>
• **index:** Returns the index of the series. <br>

o **DataFrame:** A two-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a relational database or an Excel spreadsheet.

In [3]:
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)
print(df)

   A  B
0  1  4
1  2  5
2  3  6


**Attributes:** <br>
• **columns:** Returns the column labels. <br>
• **index:** Returns the row labels. <br>
• **values:** Returns the underlying data as a NumPy array. <br>

2. **Data Handling and Manipulation:** <br>
o Reading/Writing Data: Easily read from and write to various file formats, including CSV, Excel, SQL, and JSON. <br>
o Data Cleaning: Handling missing data, removing duplicates, and filtering data. <br>
o Data Transformation: Applying functions, merging/joining datasets, grouping data, and pivoting tables. <br>

3. **Indexing and Selection:** <br>
o Powerful tools for selecting, filtering, and slicing data, allowing for intuitive data access and manipulation. <br>

4. **Time Series Handling:** <br>
o Specialized functions and tools to work with time series data, including date range generation, frequency conversion, and resampling. <br>

5. **Integration with Other Libraries:** <br>
o Works seamlessly with NumPy for numerical operations and matplotlib for data visualization. <br>

**Common Uses of Pandas in Data Science** <br>
• **Data Cleaning and Preparation:** <br>
o Handling missing values, converting data types, and normalizing data. <br>
o Removing duplicates and handling outliers. <br>
• **Exploratory Data Analysis (EDA):** <br>
o Summarizing data, calculating statistics, and generating descriptive statistics. <br>
o Visualizing data distributions and relationships using plots. <br>
• **Data Transformation:** <br>
o Applying functions to data, grouping and aggregating data, and reshaping data structures. <br>
o Creating new calculated columns based on existing data. <br>
• **Merging and Joining Data:** <br>
o Combining multiple datasets through joins and merges, similar to SQL operations. <br>
• **Time Series Analysis:** <br>
o Handling date-time data, performing rolling window calculations, and resampling time series data. <br>


**Creating DataFrames from Various Data Sources** <br>
• **From Dictionaries:**

In [10]:
data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

• **From CSV Files:**

In [6]:
df = pd.read_csv('data.csv', delimiter=';')
print(df)

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner   2.0
1       10.34  1.66    Male     No  Sun  Dinner   3.0
2       21.01  3.50    Male     No  Sun  Dinner   3.0
3       23.68  3.31    Male     No  Sun  Dinner   2.0
4       24.59  3.61  Female     No  Sun  Dinner   4.0
5         NaN   NaN     NaN    NaN  NaN     NaN   NaN


• **From Excel Files:**

In [8]:
import pandas as pd

df = pd.read_excel('data.xlsx', engine='openpyxl')

print(df.head())

   A  B  C   D
0  1  4  7  10
1  2  5  8  11
2  3  6  9  12


• **From JSON Files:**

In [5]:
import pandas as pd

# Check the file content first
file_path = 'data.json'

# Read the file content to debug
try:
    with open(file_path, 'r') as file:
        content = file.read()
        print("File Content:", content)
except FileNotFoundError:
    print(f"The file {file_path} does not exist.")
except Exception as e:
    print(f"An error occurred while reading the file: {e}")

# If the content looks valid, try to load it as JSON
try:
    df = pd.read_json(file_path)
    print(df.head())
except ValueError as ve:
    print(f"ValueError: {ve}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

File Content: ﻿total_bill;tip;sex;smoker;day;time;size
16.99;1.01;Female;No;Sun;Dinner;2
10.34;1.66;Male;No;Sun;Dinner;3
21.01;3.50;Male;No;Sun;Dinner;3
23.68;3.31;Male;No;Sun;Dinner;2
24.59;3.61;Female;No;Sun;Dinner;4
;;;;;;
ValueError: Expected object or value


• **From SQL Databases:**

In [None]:
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql_query("SELECT * FROM table_name", conn)

**DataFrame Indexing and Slicing** <br>
**Selecting Data by Label:** <br>
• Using loc for label-based indexing:

In [14]:
df.loc[0, 'A']  # Access element at row 0, column 'A'
df.loc[:, 'A']  # Access all rows of column 'A'
df.loc[0]       # Access all columns of row 0

A    1
B    4
Name: 0, dtype: int64

**Selecting Data by Position:** <br>
• Using iloc for position-based indexing:

In [16]:
df.iloc[0, 0]   # Access element at first row, first column
df.iloc[:, 0]   # Access all rows of first column
df.iloc[0]      # Access all columns of first row

A    1
B    4
Name: 0, dtype: int64

**Boolean Indexing:** <br>
• Using conditions to filter data:

In [None]:
df[df['A'] > 2]  # Returns rows where column 'A' values are greater than 2

**Setting Values:** <br>
• Modifying values based on index:

In [12]:
df.loc[0, 'A'] = 10  # Set value at row 0, column 'A' to 10

**Basic Operations on DataFrames** <br>
**Adding and Removing Columns:** <br>
• Adding a new column:

In [19]:
df['C'] = [7, 8, 9]
print(df)

    A  B  C
0  10  4  7
1   2  5  8
2   3  6  9


• **Removing a column:**

In [20]:
df.drop('C', axis=1, inplace=True)

• **Filtering Rows:** <br>
Filtering based on conditions:

In [21]:
filtered_df = df[df['A'] > 1]

• **Sorting Data:** <br>
Sorting by column values:

In [22]:
df.sort_values(by='A', ascending=False)

Unnamed: 0,A,B
0,10,4
2,3,6
1,2,5


Sorting by index:

In [24]:
df.sort_index()

Unnamed: 0,A,B
0,10,4
1,2,5
2,3,6


• **Descriptive Statistics:** <br>
Summary statistics for DataFrame:

In [25]:
df.describe()

Unnamed: 0,A,B
count,3.0,3.0
mean,5.0,5.0
std,4.358899,1.0
min,2.0,4.0
25%,2.5,4.5
50%,3.0,5.0
75%,6.5,5.5
max,10.0,6.0


**Data Cleaning and Preparation** <br>
**Handling Missing Data:** <br>
• Detecting missing values:

In [26]:
df.isnull()

Unnamed: 0,A,B
0,False,False
1,False,False
2,False,False


• **Dropping missing values:**

In [27]:
df.dropna()

Unnamed: 0,A,B
0,10,4
1,2,5
2,3,6


• **Filling missing values:**

In [28]:
df.fillna(value=0)

Unnamed: 0,A,B
0,10,4
1,2,5
2,3,6


**Data Transformation:** <br>
• **Renaming columns:**

In [18]:
df.rename(columns={'A': 'Column_A'})
df

Unnamed: 0,A,B
0,10.0,4
1,2.0,5
2,3.0,6


• **Changing data types:**

In [16]:
df['A'] = df['A'].astype(float)
df

Unnamed: 0,A,B
0,10.0,4
1,2.0,5
2,3.0,6


**Combining and Merging DataFrames:** <br>
• **Concatenating DataFrames:**

In [21]:
data1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}
data2 = {'C': [7, 8, 9], 'D': [10, 11, 12]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

pd.concat([df1, df2], axis=1)

Unnamed: 0,A,B,C,D
0,1,4,7,10
1,2,5,8,11
2,3,6,9,12


• **Merging DataFrames:**

In [35]:
data1 = {'key': [1, 2, 3], 'A': [1, 2, 3], 'B': [4, 5, 6]}
data2 = {'key': [1, 2, 3], 'C': [7, 8, 9], 'D': [10, 11, 12]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Merge the DataFrames on the 'key' column
df_merged = pd.merge(df1, df2, on='key')

print(df_merged)

   key  A  B  C   D
0    1  1  4  7  10
1    2  2  5  8  11
2    3  3  6  9  12
