## **1. Introduction to Pandas:**

Pandas is a powerful open-source data analysis and manipulation library for Python. It provides data structures and functions needed to manipulate structured data easily.

Key Features:

    - Provides two primary data structures: Series and DataFrame
    - Easy handling of missing data
    - Powerful data wrangling and filtering capabilities
    - Integration with NumPy, Matplotlib, and other libraries
    - Works well with structured data like CSV, Excel, SQL databases, etc.

## **2. Installation and Importing Pandas:**

Installation:

Use `pip` to install Pandas:

In [7]:
pip install pandas



### **Importing Pandas:**

In [8]:
import pandas as pd

The `pd` alias is a common convention.

## **3. Pandas Data Structures:**
### **3.1 Series:**

A one-dimensional labeled array that can hold any data type.

#### Example:

In [9]:
import pandas as pd

# Creating a Series from a list
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

0    10
1    20
2    30
3    40
dtype: int64


Each element is indexed with an integer by default.

### **3.2 DataFrame:**

A two-dimensional, tabular data structure with labeled axes (rows and columns).

#### Example:

In [10]:
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## **4. DataFrame Attributes:**

| Attribute | Description |
|:--------:|:--------:|
|df.shape	| Returns the dimensions of the DataFrame|
|df.columns	| Returns the column names |
| df.index	| Returns the index of the DataFrame
| df.dtypes |	Returns data types of columns

#### Example:

In [12]:
print("Shape:", df.shape)      # Output: (3, 3)
print("Columns:", df.columns)  # Output: Index(['Name', 'Age', 'City'], dtype='object')
print("Data Types:", df.dtypes)

Shape: (6085, 41)
Columns: Index(['HR', 'O2Sat', 'Temp', 'SBP', 'MAP', 'DBP', 'Resp', 'EtCO2',
       'BaseExcess', 'HCO3', 'FiO2', 'pH', 'PaCO2', 'SaO2', 'AST', 'BUN',
       'Alkalinephos', 'Calcium', 'Chloride', 'Creatinine', 'Bilirubin_direct',
       'Glucose', 'Lactate', 'Magnesium', 'Phosphate', 'Potassium',
       'Bilirubin_total', 'TroponinI', 'Hct', 'Hgb', 'PTT', 'WBC',
       'Fibrinogen', 'Platelets', 'Age', 'Gender', 'Unit1', 'Unit2',
       'HospAdmTime', 'ICULOS', 'SepsisLabel'],
      dtype='object')
Data Types: HR                  float64
O2Sat               float64
Temp                float64
SBP                 float64
MAP                 float64
DBP                 float64
Resp                float64
EtCO2               float64
BaseExcess          float64
HCO3                float64
FiO2                float64
pH                  float64
PaCO2               float64
SaO2                float64
AST                 float64
BUN                 float64
Alkalinephos     

## **5. Data Selection and Indexing:**

### **5.1. Selecting Columns:**

In [14]:
print(df["Gender"])

0       0.0
1       0.0
2       0.0
3       0.0
4       0.0
       ... 
6080    1.0
6081    1.0
6082    1.0
6083    1.0
6084    NaN
Name: Gender, Length: 6085, dtype: float64


### **5.2. Selecting Rows with `iloc` (Integer Location):**

In [15]:
print(df.iloc[1])  # Select second row

HR                  97.00
O2Sat               95.00
Temp                 0.00
SBP                 98.00
MAP                 75.33
DBP                  0.00
Resp                19.00
EtCO2                0.00
BaseExcess           0.00
HCO3                 0.00
FiO2                 0.00
pH                   0.00
PaCO2                0.00
SaO2                 0.00
AST                  0.00
BUN                  0.00
Alkalinephos         0.00
Calcium              0.00
Chloride             0.00
Creatinine           0.00
Bilirubin_direct     0.00
Glucose              0.00
Lactate              0.00
Magnesium            0.00
Phosphate            0.00
Potassium            0.00
Bilirubin_total      0.00
TroponinI            0.00
Hct                  0.00
Hgb                  0.00
PTT                  0.00
WBC                  0.00
Fibrinogen           0.00
Platelets            0.00
Age                 83.14
Gender               0.00
Unit1                0.00
Unit2                0.00
HospAdmTime 

### **5.3. Selecting Rows with Conditions:**

In [16]:
# Select rows where Age > 25
print(df[df["Age"] > 25])

         HR  O2Sat  Temp    SBP    MAP  DBP  Resp  EtCO2  BaseExcess  HCO3  \
0       0.0    0.0   0.0    0.0   0.00  0.0   0.0    0.0         0.0   0.0   
1      97.0   95.0   0.0   98.0  75.33  0.0  19.0    0.0         0.0   0.0   
2      89.0   99.0   0.0  122.0  86.00  0.0  22.0    0.0         0.0   0.0   
3      90.0   95.0   0.0    0.0   0.00  0.0  30.0    0.0        24.0   0.0   
4     103.0   88.5   0.0  122.0  91.33  0.0  24.5    0.0         0.0   0.0   
...     ...    ...   ...    ...    ...  ...   ...    ...         ...   ...   
6044   60.0   97.0   0.0    0.0  96.00  0.0  20.0    0.0         0.0   0.0   
6045   66.0   97.0   0.0    0.0  84.00  0.0  17.0    0.0         0.0   0.0   
6046   61.0   98.0   0.0    0.0  81.00  0.0  14.0    0.0         0.0   0.0   
6047   64.0   97.0  36.5    0.0  93.00  0.0  14.0    0.0         0.0   0.0   
6048   60.0   97.0   0.0    0.0  93.00  0.0  15.0    0.0         0.0   0.0   

      ...  WBC  Fibrinogen  Platelets    Age  Gender  Unit1  Un

## **6. Handling Missing Data:**

### **6.1. Checking for Missing Values:**

In [17]:
print(df.isnull())  # Returns a DataFrame with True for missing values

         HR  O2Sat   Temp    SBP    MAP    DBP   Resp  EtCO2  BaseExcess  \
0     False  False  False  False  False  False  False  False       False   
1     False  False  False  False  False  False  False  False       False   
2     False  False  False  False  False  False  False  False       False   
3     False  False  False  False  False  False  False  False       False   
4     False  False  False  False  False  False  False  False       False   
...     ...    ...    ...    ...    ...    ...    ...    ...         ...   
6080  False  False  False  False  False  False  False  False       False   
6081  False  False  False  False  False  False  False  False       False   
6082  False  False  False  False  False  False  False  False       False   
6083  False  False  False  False  False  False  False  False       False   
6084  False  False  False  False  False  False  False  False       False   

       HCO3  ...    WBC  Fibrinogen  Platelets    Age  Gender  Unit1  Unit2  \
0     Fa

### **6.2. Filling Missing Values:**

In [18]:
df["Age"].fillna(0, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["Age"].fillna(0, inplace=True)


### **6.3. Dropping Missing Values:**

In [19]:
df.dropna(inplace=True)

## **7. File I/O Operations:**

### **7.1. Reading a CSV File:**

In [21]:
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Harsha/Dataset/Dataset.csv")

### **7.2. Writing a DataFrame to CSV:**

In [22]:
df.to_csv("output.csv", index=False)

## **8. Data Aggregation and Grouping:**

### **8.1. Grouping Data:**

In [24]:
grouped = df.groupby("SBP")
print(grouped["Age"].mean())

SBP
20.0     54.600000
21.0     63.000000
22.0     64.276667
23.5     59.870000
24.0     63.392000
           ...    
295.0    88.000000
296.0    45.000000
298.0    56.000000
299.0    33.000000
300.0    42.000000
Name: Age, Length: 987, dtype: float64


### **8.2. Aggregating Data:**

In [25]:
df["Age"].sum()  # Sum of the Age column

95150087.68999998

## **9. Data Cleaning and Transformation:**

### **9.1. Renaming Columns:**

In [26]:
df.rename(columns={"Name": "Full Name"}, inplace=True)

### **9.2. Removing Duplicates:**

In [27]:
df.drop_duplicates(inplace=True)

## **10. Merging and Joining DataFrames:**

### **10.1. Merging:**

In [29]:
df1 = pd.DataFrame({"ID": [1, 2], "Name": ["Alice", "Bob"]})
df2 = pd.DataFrame({"ID": [1, 2], "Score": [90, 95]})

merged_df = pd.merge(df1, df2, on="ID")

### **10.2. Concatenation:**

In [30]:
concat_df = pd.concat([df1, df2], axis=0)

## **11. Pivot Tables:**

Pivot tables summarize and aggregate data.

In [34]:
pivot = df.pivot_table(values="Age", index="SBP", aggfunc="mean")

pivot

Unnamed: 0_level_0,Age
SBP,Unnamed: 1_level_1
20.0,54.600000
21.0,63.000000
22.0,64.276667
23.5,59.870000
24.0,63.392000
...,...
295.0,88.000000
296.0,45.000000
298.0,56.000000
299.0,33.000000


## **12. Time Series Analysis:**

Pandas has extensive support for time series data.

#### **Creating a Time Series:**

In [35]:
date_range = pd.date_range(start="2024-01-01", periods=5, freq="D")
time_series = pd.Series([1, 2, 3, 4, 5], index=date_range)
print(time_series)

2024-01-01    1
2024-01-02    2
2024-01-03    3
2024-01-04    4
2024-01-05    5
Freq: D, dtype: int64


## **13. Visualization with Pandas:**

You can plot data using `plot()` (requires `matplotlib`).

In [None]:
df["Age"].plot(kind="bar")