# Part 3: Package Management and Introduction to Key Libraries

In this notebook, we'll learn how to install and import Python packages, and get introduced to essential data science libraries.

## Topics Covered:
- Installing packages with pip
- Importing packages and modules
- Introduction to NumPy
- Introduction to Pandas
- Introduction to Scikit-learn

## 1. Installing Packages

Python packages extend the functionality of Python. We use `pip` to install them.

### Common pip commands:
```bash
# Install a package
pip install package_name

# Install specific version
pip install package_name==1.2.3

# Install from requirements.txt
pip install -r requirements.txt

# List installed packages
pip list

# Show package info
pip show package_name

# Uninstall a package
pip uninstall package_name
```

**Note:** In this Codespace, all required packages are already installed via `requirements.txt`

## 2. Importing Packages

After installation, we need to import packages to use them:

In [None]:
# Different ways to import

# Import entire package
import numpy

# Import with alias (common convention)
import numpy as np
import pandas as pd

# Import specific function/class
from sklearn.linear_model import LinearRegression

# Import multiple items
from sklearn.model_selection import train_test_split, cross_val_score

print("Packages imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 3. Introduction to NumPy

NumPy is the fundamental package for numerical computing in Python. It provides support for arrays and matrices.

In [None]:
import numpy as np

# Creating arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

print("1D Array:", arr1)
print("\n2D Array:\n", arr2)
print("\nShape of arr2:", arr2.shape)
print("Data type:", arr1.dtype)

In [None]:
# Useful NumPy functions

# Create arrays with specific values
zeros = np.zeros((3, 4))  # 3x4 array of zeros
ones = np.ones((2, 3))    # 2x3 array of ones
range_arr = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5)  # 5 evenly spaced values from 0 to 1

print("Zeros:\n", zeros)
print("\nOnes:\n", ones)
print("\nRange:", range_arr)
print("Linspace:", linspace)

In [None]:
# Array operations
arr = np.array([1, 2, 3, 4, 5])

print("Original array:", arr)
print("Array + 10:", arr + 10)
print("Array * 2:", arr * 2)
print("Array squared:", arr ** 2)
print("\nMean:", arr.mean())
print("Sum:", arr.sum())
print("Standard deviation:", arr.std())
print("Min:", arr.min())
print("Max:", arr.max())

## 4. Introduction to Pandas

Pandas is the go-to library for data manipulation and analysis. It provides DataFrame and Series objects.

In [None]:
import pandas as pd

# Creating a Series (1D labeled array)
series = pd.Series([10, 20, 30, 40, 50])
print("Series:")
print(series)

# Creating a Series with custom index
series_labeled = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print("\nLabeled Series:")
print(series_labeled)

In [None]:
# Creating a DataFrame (2D labeled data structure)
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'age': [25, 30, 35, 28],
    'city': ['New York', 'Paris', 'London', 'Tokyo'],
    'salary': [70000, 80000, 75000, 85000]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

# Basic info about DataFrame
print("\nDataFrame shape:", df.shape)
print("Column names:", df.columns.tolist())
print("Data types:\n", df.dtypes)

In [None]:
# Quick data exploration
print("First 3 rows:")
print(df.head(3))

print("\nLast 2 rows:")
print(df.tail(2))

print("\nDataFrame info:")
df.info()

print("\nSummary statistics:")
print(df.describe())

## 5. Introduction to Scikit-learn

Scikit-learn is the main library for machine learning in Python. It provides tools for classification, regression, clustering, and more.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Example: Simple linear regression on hypothetical data
# Let's create some sample data
np.random.seed(42)
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = 2 * X.flatten() + 1 + np.random.randn(10) * 0.5  # y = 2x + 1 + noise

print("Features (X):")
print(X)
print("\nTarget (y):")
print(y)

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

print(f"\nModel coefficient (slope): {model.coef_[0]:.2f}")
print(f"Model intercept: {model.intercept_:.2f}")

In [None]:
# Make predictions
y_pred = model.predict(X_test)

print("Predictions vs Actual:")
for i, (pred, actual) in enumerate(zip(y_pred, y_test)):
    print(f"Test {i+1}: Predicted={pred:.2f}, Actual={actual:.2f}")

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nMean Squared Error: {mse:.2f}")
print(f"RÂ² Score: {r2:.2f}")

## 6. Visualization with Matplotlib

Quick introduction to plotting:

In [None]:
import matplotlib.pyplot as plt

# Simple line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.figure(figsize=(10, 5))
plt.plot(x, y, label='sin(x)', color='blue')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Sine Wave')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Scatter plot with our linear regression example
plt.figure(figsize=(10, 6))
plt.scatter(X_train, y_train, color='blue', label='Training data', alpha=0.6)
plt.scatter(X_test, y_test, color='green', label='Test data', alpha=0.6)
plt.plot(X, model.predict(X), color='red', label='Regression line', linewidth=2)
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression Example')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Practice Exercises

In [None]:
# Exercise 1: Create a NumPy array of numbers from 1 to 20
# Calculate and print the mean, median, and standard deviation

# Your code here:

In [None]:
# Exercise 2: Create a Pandas DataFrame with information about
# 5 products (name, price, quantity)
# Add a new column 'total_value' (price * quantity)

# Your code here:

In [None]:
# Exercise 3: Create a simple bar plot showing
# the ages from the DataFrame we created earlier

# Your code here: