# Just Enough Python for AI/Data Science
#### Module 7

**Introduction to Popular Data Science Libraries**
___
> In this module, you’ll get a sneak peek at the three “celebrity guests” of Python’s data science universe: NumPy for numeric computing, pandas for data wrangling, and matplotlib/seaborn for visualization. The idea is to show that with just the Python basics (plus these libraries), you can already do some powerful data tasks—without diving deep into actual ML modeling quite yet.

##### Overview:
- This module serves as a crash course in the essential data science libraries: NumPy, pandas, and a sprinkle of matplotlib/seaborn for plotting.
- By the end, you’ll know how to install them, do some basic manipulations, and create simple visuals. Think of it as a gateway into the magical world of data analysis.

**1. NumPy: Your High-Speed Number Cruncher**
- NumPy arrays are like Python lists on steroids—faster operations and built for math.
- Example:

In [1]:
import numpy as np

my_array = np.array([1, 2, 3, 4])
print(my_array * 2)  # [2, 4, 6, 8]
print(my_array.shape)  # (4,)


[2 4 6 8]
(4,)


- Broadcasting, slicing, and various math functions are where NumPy really shines.
- Example: If you have two arrays of the same shape, you can manipulate them element-wise easily.

**2. pandas: DataFrames for the Win**
- pandas gives you the beloved DataFrame structure, which is basically a supercharged spreadsheet in code form.
- A typical workflow:

In [None]:
import pandas as pd

# Read data from a CSV
df = pd.read_csv("sample_data.csv")
print(df.head())  # See first few rows

# Basic info
print(df.info())

# Simple summary stats
print(df.describe())

- You can filter, group, sort, pivot, merge—it’s your Swiss Army Knife of Data.
- Perfect for quick exploration and a must-have in data science and AI projects.

**3. matplotlib & seaborn: Drawing Pretty Graphs**
- matplotlib is the fundamental plotting library—great for custom charts.
- seaborn (built on matplotlib) provides prettier, more “statistically-minded” plots out of the box.
- Example:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()  # Make seaborn your default style

# Suppose 'df' from above has columns 'Age' and 'Salary'
plt.scatter(df['Age'], df['Salary'])
plt.title("Age vs Salary")
plt.xlabel("Age")
plt.ylabel("Salary")
plt.show()

**A Fun Mini Project (Combining Everything)**
- Load a CSV using pandas.
- Get some basic stats (df.describe()).
- Clean up any missing values (e.g., df.dropna()).
- Convert a column into a NumPy array to do a quick math operation (like scaling).
- Finally, plot a histogram of one numerical column or a bar chart of some category distribution.
- Show that all this is possible with “just enough Python” plus these libraries.

**Installation & Quick Tips**
- Installation: pip install numpy pandas matplotlib seaborn (or use conda install if you’re in the Anaconda ecosystem).
- Check versions if weird errors occur: sometimes an update on one library can cause a chain reaction.
- Start simple: load a small CSV with around 100 rows, not a massive 10GB dataset.