# Chapter 1. Introduction

## 1.2. Common Python Modules

In this section, we will learn the essentials of common Python modules. The following modules will be covered:

1. Numpy
2. Pandas
3. Matplotlib

### 1.2.1. NumPy

#### 1.2.1.1. Installation

To start using NumPy, you need to install it in your Python environment. Note that the default Anaconda environment already has NumPy installed, so you don't need to do it. To install NumPy in your custom environment, run the following command:

In [None]:
!conda install numpy

For more information about NumPy, see [documentation](https://numpy.org/doc/)

#### 1.2.1.2. Importing NumPy

To use NumPy in your Python code, you need to import the library. It's a common convention to import NumPy as `np`:

In [None]:
import numpy as np

#### 1.2.1.3. Creating NumPy Arrays

NumPy's primary data structure is the numpy.array. You can create arrays in various ways. Here are a few examples:

1. Creating an Array from a List

In [None]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print(my_list)
print(type(my_list))
print(my_array)
print(type(my_array))

2. Creating an Array of Zeros

In [None]:
zeros_array = np.zeros(5)  # Creates a 1D array with 5 zeros
print(zeros_array)

3. Creating an Array of Ones

In [None]:
ones_array = np.ones((3, 3))  # Creates a 3x3 array of ones
print(ones_array)

4. Creating a Range of Values

In [None]:
range_array = np.arange(0, 10, 2)  # Creates an array [0, 2, 4, 6, 8], the point '10' is not included
print(range_array)

In [None]:
linspace_array = np.linspace(2.0, 3.0, num=5) # Creates an array [2.  , 2.25, 2.5 , 2.75, 3.  ]
print(linspace_array)

#### 1.2.1.4. Basic NumPy Operations

NumPy allows you to perform mathematical operations on arrays efficiently. Here are some examples:

**Get An Element**

Similar to list, you can get an element in a NumPy array using index:

In [None]:
array1 = np.array([1, 2, 3])
element1 = array1[0]
print(element1)

**Add Elements**

Similar to list, you can get new elements into a NumPy array using `np.append()` or `np.concatenate()` function. For `np.concatenate()`, put all input arrays in a tuple. You need to assign to output of these functions to a new Numpy array.

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.append(array1, 10)
print(array1)
print(array2)

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.concatenate((array1, array2))
print(array1)
print(array2)
print(array3)

**Slicing**

Numpy slicing is a powerful and flexible way to extract and manipulate portions of a NumPy array. Slicing allows you to select a subset of elements from an array by specifying a range or a set of indices.

***Basic Slicing:***

Syntax: `array[start:stop]`
- Returns a view of the array elements from the index start (inclusive) to the index stop (exclusive).
- If `start` is not provided, it defaults to `0`. If `stop` is not provided, it defaults to the end of the array.
- You can use negative indices to count from the end of the array.

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[2:5]  # Slices from index 2 to 4
print(sliced)

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[0:-2]  # Slices from index 0 to 4 (remove the last 2 elements)
print(sliced)

***Step Slicing:***

Syntax: `array[start:stop:step]`
- The `step` argument allows you to skip elements while slicing.
- If `step` is not specified, it defaults to `1`.

In [None]:
arr = np.array([0, 1, 2, 3, 4, 5])
sliced = arr[1:5:2]  # Slices from index 1 to 4 with a step of 2
print(sliced)

**Element-wise Operations**

You can perform element-wise operations like addition, subtraction, multiplication, and division:

In [None]:
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

result1 = array1 + array2  # Element-wise addition
result2 = array1 + 10  # Element-wise addition with a scalar
result3 = array1 * array2  # Element-wise multiplication
result4 = array1 * 5  # Element-wise multiplication with a scalar
print(result1)
print(result2)
print(result3)
print(result4)

**Dot Product**

You can calculate the dot product of two arrays:

In [None]:
dot_product = np.dot(array1, array2)
print(dot_product)

**Statistical Functions**

NumPy provides various statistical functions, such as mean, median, and standard deviation:

In [None]:
data = np.array([1, 2, 3, 4, 5])
max_value = np.max(data)
min_value = np.min(data)
mean_value = np.mean(data)
median_value = np.median(data)
std_deviation = np.std(data)
print(max_value)
print(min_value)
print(mean_value)
print(median_value)
print(std_deviation)

### 1.2.2. Pandas

#### 1.2.2.1. Installation

To start using Pandas, you need to install it in your Python environment. Note that the default Anaconda environment already has Pandas installed, so you don't need to do it. To install Pandas in your custom environment, run the following command:

In [None]:
!conda install pandas

For more information about Pandas, see [documentation](https://pandas.pydata.org/docs/)

#### 1.2.2.2. Importing Pandas

To use Pandas in your Python code, you need to import the library:

In [None]:
import pandas as pd

#### 1.2.2.3. Pandas Data Structures

Pandas provides two primary data structures: Series and DataFrame.

**Series**

A Series is a one-dimensional array-like object that can hold various data types. You can think of it as a column in a spreadsheet or a single-dimensional array. Here's how you can create a Series:

In [None]:
data = pd.Series([1, 3, 5, 7, 9])
print(data)

**DataFrame**

A DataFrame is a two-dimensional tabular data structure with rows and columns, similar to a spreadsheet or a SQL table. You can create a DataFrame using dictionaries, lists, or other data structures:

In [None]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40]
}

df = pd.DataFrame(data)
print(df)

#### 1.2.2.4. Reading and Writing Data

Pandas supports various file formats for reading and writing data, including CSV, Excel, SQL, and more. Here are some examples of reading and writing data:

**Reading Data**

In [None]:
# Read data from a CSV file
df = pd.read_csv('./datasets/IrisFlower.csv')
print(df.head()) # Show the top 5 rows

In [None]:
# Read data from an Excel file
df = pd.read_excel('./datasets/IrisFlower.xlsx')
print(df.head(10)) # Show the top 10 rows

**Writing Data**

In [None]:
# Write data to a CSV file
df.to_csv('output.csv', index=False)

In [None]:
# Write data to an Excel file
df.to_excel('output.xlsx', index=False)

#### 1.2.2.5. Basic Data Manipulation

Pandas allows you to perform various data manipulation tasks, such as filtering, sorting, and aggregating data.

**Filtering Data**

You can filter data based on specific conditions:

In [None]:
filtered_df = df[df['Sepal length'] > 5.0]
print(filtered_df.head())

**Sorting Data**

You can sort data by one or more columns:

In [None]:
sorted_df = df.sort_values(by='Sepal length')
print(sorted_df.head())

**Aggregating Data**

You can perform operations like sum, mean, and count on specific columns:

In [None]:
# Mean sepal width
mean_sepal_width = df['Sepal width'].mean()
print(mean_sepal_width)

In [None]:
# Number of species
num_species = df['Species'].nunique()
print(num_species)

### 1.2.3. Matplotlib and Seaborn

#### 1.2.3.1. Installation

To use Matplotlib and Seaborn, you need to install them in your Python environment. You can do this with Anaconda by running the following commands:

In [None]:
!conda install matplotlib

In [None]:
!conda install seaborn

[Matplotlib documentation](https://matplotlib.org/stable/index.html)

[Seaborn documentation](https://seaborn.pydata.org/)

#### 1.2.3.2. Importing Matplotlib and Seaborn

To use Matplotlib and Seaborn in your Python code, you need to import the libraries:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

#### 1.2.3.3. Matplotlib

Matplotlib is a versatile library that provides a wide range of options for creating static, animated, or interactive visualizations. It's well-suited for creating various types of plots, including line plots, bar plots, scatter plots, and more.

**Example: Line Plot**

Let's create a simple line plot to visualize a set of data points:

In [None]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

#### 1.2.3.4. Seaborn

Seaborn is built on top of Matplotlib and provides a high-level interface for creating attractive and informative statistical graphics. It simplifies many common tasks and offers various built-in themes and color palettes.

**Example: Scatter Plot**

Let's create a scatter plot using Seaborn to visualize the relationship between two variables:

In [None]:
# Load a sample dataset
data = pd.read_csv('./datasets/IrisFlower.csv')

# Create a scatter plot
sns.scatterplot(data=data, x='Sepal length', y='Sepal width')
plt.title("Scatter Plot")
plt.show()

Both Matplotlib and Seaborn allow you to customize your plots extensively. You can modify colors, labels, titles, legends, and more to make your visualizations informative and visually appealing.