# Lesson 3A: Introduction to Pandas and Data Frames

## 🎯Learning Objectives:
1. Understand Python modules and the role of Pandas in Data Analysis.
2. Load CSV Files into Jupyter Notebook.
3. Explore Pandas data frames on Pandas

## 1️⃣ Importing Python Modules

- A Python module is a file that contains Python code—this can include functions, classes, and variables—that you can reuse in other Python programs.
- Pandas module will be used in data preparation for data analysis.



In [1]:
import pandas as pd

# 2️⃣ Loading the Dataset

Before analyzing our dataset, we will need to load our dataset into a **DataFrame** using pandas.

Let's begin by loading our dataset!

In [2]:

df = pd.read_csv("Restaurant_Transactions_Dataset.csv")

df.head()

Unnamed: 0,Customer_ID,Food_Item,Category,Date_of_Visit,Time,Weather,Price,Weekend,Public_Holiday
0,1075,Smoothie,Cold,24/03/2023,12:30,Sunny,14.88,No,No
1,1030,Soup,Hot,19/03/2023,14:30,Raining,6.7176,Yes,No
2,1055,Ice Cream,Cold,24/03/2023,10:30,Sunny,14.27,No,No
3,1058,Ice Cream,Cold,05/03/2023,22:00,Sunny,14.688,Yes,No
4,1084,Smoothie,Cold,29/03/2023,16:30,Sunny,8.844,No,Yes


# 3️⃣ Exploring the Dataset

Now that we've loaded the dataset, let's explore its structure!

### 📊 Key things to check:
- **Number of rows & columns**
- **Column names & data types**
- **Summary statistics**

By doing this, we can understand what kind of data we're working with and spot any potential issues early on.


In [14]:
print(f"This dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

print("\nColumn names & data types:")
print(df.dtypes)

print("\nSummary statistics:")
print(df.describe())

This dataset contains 1002 rows and 9 columns.

Column names & data types:
Customer_ID         int64
Food_Item          object
Category           object
Date_of_Visit      object
Time               object
Weather            object
Price             float64
Weekend            object
Public_Holiday     object
dtype: object

Summary statistics:
       Customer_ID       Price
count  1002.000000  996.000000
mean   1048.681637   13.221576
std      29.341822    4.785411
min    1000.000000    4.509000
25%    1023.000000    9.096000
50%    1050.000000   13.157000
75%    1074.000000   16.982500
max    1099.000000   23.940000


### 📊 Common Aggregate Functions in Pandas

| Function       | Description                                                                 | Example Usage                |
|----------------|-----------------------------------------------------------------------------|------------------------------|
| `sum()`        | Returns the sum of values for each column or row.                          | `df.sum()`                   |
| `mean()`       | Calculates the average (mean) of numeric values.                           | `df.mean()`                  |
| `median()`     | Returns the median (middle) value.                                         | `df.median()`                |
| `min()`        | Finds the minimum value in each column or row.                            | `df.min()`                   |
| `max()`        | Finds the maximum value in each column or row.                            | `df.max()`                   |
| `count()`      | Counts the number of non-null values.                                     | `df.count()`                 |
| `std()`        | Calculates the standard deviation of numeric values.                      | `df.std()`                   |
| `var()`        | Computes the variance of each column.                                     | `df.var()`                   |
| `describe()`   | Generates descriptive statistics for numeric columns (count, mean, std...).| `df.describe()`              |
| `agg()`        | Allows applying multiple aggregate functions at once.                     | `df.agg(['sum', 'mean'])`    |

> 📝 **Note**: By default, these functions operate column-wise. To apply them row-wise, use `axis=1`, e.g., `df.sum(axis=1)`.
