# Python Pandas

### What's Pandas

- Pandas is an open-source data analysis and manipulation library for Python.
- It provides data structures and functions needed to manipulate structured data seamlessly.
- Pandas is built on top of NumPy and integrates well with other libraries such as Matplotlib and Scikit-learn.

In [None]:
# Import the warnings module, which provides a way to handle warning messages
import warnings  

# Suppress all warning messages to prevent them from being displayed  
# This is useful in scenarios where warnings are not critical and can clutter the output
warnings.filterwarnings("ignore")  

In [None]:
# Import the pandas library for data manipulation and analysis
import pandas as pd  

# Import the numpy library for numerical computing and working with arrays
import numpy as np  

### Pandas Data Structures

- Pandas primarily provides two data structures:
1. Series: 1-dimensional labeled array capable of holding any data type.
2. DataFrame: 2-dimensional labeled data structure with columns of potentially different types.

### How to Create Series

 Series can be created from lists, dictionaries, or scalar values.

In [None]:
# Example: Creating Series from a list
data = [1, 2, 3, 4, 5]
series_from_list = pd.Series(data)
print(series_from_list)

In [None]:
# Example: Creating Series from a dictionary
data = {'a': 1, 'b': 2, 'c': 3}
series_from_dict = pd.Series(data)
series_from_dict

In [None]:
# Practice Questions:
# 1. Create a Pandas Series from a list of numbers [10, 20, 30, 40, 50].
# 2. Create a Pandas Series from a dictionary {'x': 100, 'y': 200, 'z': 300}.

### Series Changing Index

- You can change the index of a Series using the index attribute.

In [None]:
# Example: Changing index of a Series
data = [1, 2, 3, 4, 5]
new_data = pd.Series(data)
new_data.index = ['a', 'b', 'c', 'd', 'e']
print(new_data)

In [None]:
# Reset the index and keep as Series
new_data = new_data.reset_index(drop=True)
new_data

In [None]:
# Practice Questions:
# 1. Create a Pandas Series and change its index to ['p', 'q', 'r', 's', 't'].
# 2. Explain why changing the index of a Series might be useful.

### Series Addition

- Series support element-wise operations, including addition.

In [None]:
# Example: Adding two Series
series1 = pd.Series([1, 2, 3])
series2 = pd.Series([4, 5, 6])
result = series1 + series2
print(result)

In [None]:
# Practice Questions:
# 1. Create two Pandas Series and perform element-wise addition.
# 2. What happens if the Series have different indices?

### What's a DataFrame

- A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
- It is similar to a table in a database or an Excel spreadsheet.

### How to Create DataFrames

- DataFrames can be created from dictionaries, lists of dictionaries, or other DataFrames.

In [None]:
# Example: Creating DataFrame from a dictionary
data = {'name': ['Suki', 'Mai', 'Azula'], 'age': [15, 14, 16]}
df1 = pd.DataFrame(data)
df1

In [None]:
# Practice Questions:
# 1. Create a Pandas DataFrame from a dictionary with keys 'product' and 'price'.
# 2. Create a Pandas DataFrame from a list of dictionaries.

### Exporting DataFrame into CSV

- DataFrames can be exported to CSV files using the `to_csv()` method.

In [None]:
# Example: Exporting DataFrame to CSV
df1.to_csv('data1.csv', index=False)

In [None]:
# Practice Questions:
# 1. Write a Pandas DataFrame to a CSV file named 'output.csv'.
# 2. Explain the purpose of the 'index' parameter in the to_csv() method.

### Importing CSV into DataFrame

- CSV files can be imported to dataframes using the `read_csv()` method.

In [None]:
df2 = pd.read_csv('data1.csv')
df2

In [None]:
# Import the titanic dataset
titanic_df = pd.read_csv('train.csv')

In [None]:
# Display the dataset
titanic_df

### Getting Information and Description from DataFrame

- Use the `head()` method to view the first few lines, `info()` method to get a summary of the DataFrame, and `describe()` to get statistical details.

In [None]:
titanic_df.head() #display first 5 rows

In [None]:
titanic_df.tail() #display last 5 rows

In [None]:
titanic_df.sample(5) #display random 5 rows

In [None]:
titanic_df.iloc[[500]] # Return row at index 500

In [None]:
# pd.set_option('display.max_rows', None) # Make Pandas display all the rows
# pd.reset_option('display.max_rows')  # As it was in the beginning

In [None]:
titanic_df

In [None]:
# Display a concise summary of the Titanic DataFrame
titanic_df.info()

In [None]:
# Generate summary statistics for numerical columns in the Titanic DataFrame
# This includes count, mean, standard deviation, min, max, and quartiles
titanic_df.describe()

In [None]:
# Generate summary statistics for only the categorical (object) columns in the Titanic DataFrame
# This includes count, unique values, most frequent value (top), and its frequency (freq)
titanic_df.describe(include='object')

In [None]:
# Generate summary statistics for all columns (both numerical and categorical) in the Titanic DataFrame
# The include='all' parameter ensures that all data types are considered
# The .T (transpose) function flips the rows and columns for better readability
titanic_df.describe(include='all').T

In [None]:
# Practice Questions:
# 1. Create a DataFrame and use the info() method to get its summary.
# 2. Use the describe() method to get statistical details of a DataFrame.

### DataFrame Bracket Selection

- You can select columns of a DataFrame using bracket notation.

In [None]:
df2

In [None]:
# Example: Selecting column
df2['name']  # Single column

In [None]:
df2[['name', 'age']]  # Multiple columns

In [None]:
titanic_df[['Age', 'Name',  'Sex']]

In [None]:
# Practice Questions:
# 1. Select a single column from a DataFrame using bracket notation.
# 2. Select multiple columns from a DataFrame using bracket notation.

### Setting Index in DataFrame

- You can set a column as the index of a DataFrame using the set_index() method.

In [None]:
# Example: Setting index
df2.set_index('name', inplace=True)

In [None]:
df2

In [None]:
# Practice Questions:
# 1. Set the index of a DataFrame to one of its columns.
# 2. Explain why setting an index might be useful.

### DataFrame loc/iloc

- loc is used for label-based indexing, while iloc is used for integer-based indexing.

In [None]:
# Example: Using loc and iloc
print(df2.loc['Suki'])  # Label-based
print(df2.iloc[0])  # Integer-based

In [None]:
df2.loc[['Azula', 'Mai']]

In [None]:
# Practice Questions:
# 1. Use loc to select rows by labels.
# 2. Use iloc to select rows by integer positions.

### Add New Column to DataFrame

In [None]:
df2['Gender'] = ['female', 'female', 'female']
df2

### Add New Row to DataFrame

In [None]:
df2.loc['Bolu'] = [17,np.nan]

In [None]:
df2

### DataFrame Drop

- You can drop rows or columns from a DataFrame using the drop() method.

In [None]:
# Example: Dropping rows and columns
df2.drop('Azula', inplace=True)  # Dropping a row
df2

In [None]:
df2.drop('age', axis=1, inplace=True)  # Dropping a column
df2

In [None]:
# Practice Questions:
# 1. Drop a row from a DataFrame.
# 2. Drop a column from a DataFrame.

### DataFrame Concatenation

- You can concatenate DataFrames using the concat() function.

In [None]:
# Example: Concatenating DataFrames by Rows
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

In [None]:
df1

In [None]:
df2

In [None]:
df_concat = pd.concat([df1, df2],  ignore_index=True) 
df_concat

In [None]:
# Example: Concatenating DataFrames by Columns
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})

In [None]:
# Concatenate df1 and df2 along columns (axis=1) to merge them side by side
df_concat2 = pd.concat([df1, df2], axis = 1)
df_concat2

In [None]:
# Practice Questions:
# 1. Concatenate two DataFrames vertically.
# 2. Concatenate two DataFrames horizontally (using the axis parameter).

### Practice Questions

### Problem:

Write a Pandas program to create and display a DataFrame from the following dictionary data with index labels:

```
student_data = {
    'name': ['John', 'Sara', 'Tom', 'Lucy', 'Anna', 'Mike', 'Chris', 'Laura', 'Nick', 'Sophia'],
    'math_score': [90, 85, 78, np.nan, 95, 88, 92, np.nan, 79, 85],
    'english_score': [88, 92, 80, 76, 89, 85, 93, np.nan, 82, 88],
    'attempts': [1, 2, 2, 3, 1, 2, 1, 1, 2, 3],
    'pass': ['yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'yes']
}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
```

##### Instructions:
1. Print the first three rows using the `head()` method.
2. Delete rows with NaN values.
3. Extract the 'name' and 'math_score' columns from the DataFrame.
4. Append a new row 'k' to the DataFrame with these values (name: "Alex", math_score: 88, english_score: 91, attempts: 1, pass: "yes").
5. Delete the 'attempts' column from the DataFrame.
6. Add a new column "olympiad_team" that will have a value of 1 if the average score (math and english) is higher than 90, else 0.
7. Export the final DataFrame to a CSV file named "student_data.csv".

---
_**Your Dataness**_,  
`Obinna Oliseneku` (_**Hybraid**_)  
**[LinkedIn](https://www.linkedin.com/in/obinnao/)** | **[GitHub](https://github.com/hybraid6)**  