# ðŸ“˜ Day 23: Introduction to Pandas

Welcome to Pandas! Pandas is the most popular Python library for data manipulation and analysis. It provides data structures and functions needed to work with structured data seamlessly.

The primary data structure in Pandas is the **DataFrame**, which is like a spreadsheet or a SQL table. It's a two-dimensional labeled data structure with columns of potentially different types.

## 1. Creating a DataFrame

A common way to create a DataFrame from scratch is by using a Python dictionary. The keys of the dictionary become the column names, and the lists of values become the data in those columns.

In [None]:
# The standard convention for importing pandas is to use the alias 'pd'
import pandas as pd
import plotly.express as px

data = {
    "Product Name": ["Laptop", "Mouse", "Keyboard", "Monitor", "Webcam"],
    "Category": [
        "Electronics",
        "Electronics",
        "Electronics",
        "Electronics",
        "Peripherals",
    ],
    "Price": [1200, 25, 75, 300, 50],
    "Units Sold": [150, 300, 220, 180, 250],
}

# Create the DataFrame
df = pd.DataFrame(data)

# Display the DataFrame. In a notebook, this provides a nice HTML table.
display(df)

## 2. Inspecting the DataFrame

These are the first commands you should run after creating or loading a DataFrame to understand its structure and content.

### `.head()`
The `.head()` method shows the first few rows of the DataFrame (5 by default). It's great for getting a quick snapshot of your data.

In [None]:
df.head()

### `.info()`
The `.info()` method provides a concise summary of the DataFrame's structure, including the index type, column types, non-null values, and memory usage.

In [None]:
df.info()

### `.describe()`
The `.describe()` method generates descriptive statistics for the numerical columns, such as count, mean, standard deviation, min, and max.

In [None]:
df.describe()

## 3. Selecting Columns

You can select columns from a DataFrame to work with specific subsets of your data.

### Selecting a Single Column
To select a single column, use its name in square brackets. This returns a Pandas **Series**, which is like a single column of a DataFrame.

In [None]:
price_column = df["Price"]
display(price_column)

### Selecting Multiple Columns
To select multiple columns, pass a list of column names inside the square brackets. This returns a new, smaller DataFrame.

In [None]:
product_and_sales = df[["Product Name", "Units Sold"]]
display(product_and_sales)

## 4. Creating a New Column (Vectorized Operation)

One of the most powerful features of Pandas is **vectorization**. You can perform operations on entire columns at once without writing a loop. Here, we create a new 'Revenue' column by multiplying the 'Price' and 'Units Sold' columns.

In [None]:
df["Revenue"] = df["Price"] * df["Units Sold"]
display(df.head())

## 5. Visualization

Now that we have a 'Revenue' column, we can easily create an interactive chart to visualize it. Here, we use Plotly Express to create a bar chart showing the revenue generated by each product.

In [None]:
fig = px.bar(df, 
             x='Product Name', 
             y='Revenue', 
             title='Total Revenue by Product',
             labels={'Product Name': 'Product', 'Revenue': 'Total Revenue ($)'},
             color='Category',
             text_auto='.2s')

fig.show()