
# Getting Started with pandas 📊

Welcome! This notebook will introduce you to `pandas`, a powerful Python library for data manipulation and analysis.


In [None]:

import pandas as pd
import numpy as np



## What is a Series?

A `Series` is a one-dimensional labeled array that can hold data of any type.


In [None]:

print("Creating a Series with numbers and a missing value:")
numbers = pd.Series([10, 20, 30, np.nan, 50])
print(numbers)



## Creating a DataFrame

A `DataFrame` is a 2D table with labeled axes (rows and columns).


In [None]:

print("Creating a random 3x4 DataFrame with fruit names as columns:")
data = np.random.randint(1, 100, size=(3, 4))
df = pd.DataFrame(data, columns=['Apples', 'Bananas', 'Oranges', 'Pears'])
print(df)



## Working with Dates

pandas makes working with date ranges simple.


In [None]:


print("Creating a DataFrame with a DateTime index:")
date_index = pd.date_range(start='2022-01-01', periods=5)
df_dates = pd.DataFrame(np.random.randn(5, 3), index=date_index, columns=['X', 'Y', 'Z'])
print(df_dates)


## Data Inspection

You can view portions of the DataFrame or inspect its structure easily.


In [None]:
print("First rows of the DataFrame:")
print(df.head())

print("Last rows of the DataFrame:")
print(df.tail())

print("Column names in the DataFrame:")
print(df.columns)


## Data Transformation

Here we demonstrate how to modify and filter data using pandas.


In [None]:
print("Adding a new column 'Total' which is the row-wise sum:")
df['Total'] = df.sum(axis=1)

print("Sorting the DataFrame by the 'Total' column:")
df_sorted = df.sort_values(by='Total')
print(df_sorted)


## Selecting Rows

Use `loc` for label-based access and `iloc` for integer-location access.


In [None]:
print("Selecting rows 0 to 1 and columns 1 to 2 using iloc:")
selected = df.iloc[0:2, 1:3]
print(selected)


## Handling Missing Data

Missing values are common in real-world datasets. Here's how to detect and handle them.


In [None]:
print("Creating a copy of the DataFrame and inserting a NaN value:")
df_with_nan = df.copy()
df_with_nan.iloc[0, 0] = np.nan

print("Checking where the missing values are:")
print(df_with_nan.isna())

print("Any missing?", df_with_nan.isna().any().any())


## Applying Functions

You can apply transformations across columns using `.apply()`.


In [None]:
print("Applying a lambda function to double the 'Apples' column:")
df['Apples x2'] = df['Apples'].apply(lambda x: x * 2)
print(df)


## Grouping Data

Use `groupby()` to perform operations over subsets of your data.


In [None]:
print("Grouping rows by 'Category' and calculating the mean of 'Apples' and 'Bananas':")
df['Category'] = ['Fruit', 'Fruit', 'Vegetable']
print(df.groupby('Category')[['Apples', 'Bananas']].mean())