# **Introduction to Machine Learning and Artificial Intelligence (August - September 2024)**


<br>

![alt text](image.png)

**Lecturer:** Dr. Darshan Ingle

**Modules Covered:**
Matplotlib (matplotlib), WordCloud (wordcloud), HuggingFace Transformers (transformers), FastText (fasttext), Numpy (numpy), SMOTE (imblearn.over_sampling.SMOTE), GloVe (glove-python), Keras API (tensorflow.keras), NLTK (nltk), Seaborn (seaborn), Keras (tensorflow.keras), TQDM (tqdm), TensorFlow (tensorflow), Pandas (pandas), Scikit-learn (sklearn)

<br>
<br>

# Day 1: Introduction to Data Science, Numpy, Pandas, Matplotlib

## Numpy (Numerical Python):
**1. ndarray:** N-dimensional array used for handling large datasets in numerical computing.

**2. nparray:** A shorthand for numpy arrays; allows vectorized operations, making it efficient for large-scale calculations.

**3. Attributes:**
* **dtype:** Data type of elements in the array.
* **ndim:** Number of dimensions of the array.
* **shape:** The structure of the array (rows, columns).

**4. Creating arrays:**
* **np.array(data, dtype):** Creates a numpy array with specific data and data type.
* **np.zeros((2,3)):** Creates a 2x3 matrix filled with zeroes.
* **np.ones((2,3)):** Creates a 2x3 matrix filled with ones.
* **np.identity(3):** Generates a 3x3 identity matrix.
* **np.random.randint(3, 4):** Generates random integers within a range.
* **np.arange(12):** Creates an array with a sequence of numbers.
* **np.reshape(3, 4):** Reshapes an array into a 3x4 matrix.

**5. Linear Algebra Operations:**
* **np.linalg.solve:** Solves a linear matrix equation.

**6. Broadcasting:**
* Allows numpy to perform element-wise operations between arrays of different shapes.

**7. Common in Computer Imaging:**
* Numpy is frequently used in computer vision tasks for image data manipulation.

## Pandas (Python for Analytics and Data Science):
**1. Series and DataFrames:**
* **Series:** one-dimensional labeled array
* **DataFrames:** two-dimensional, allowing manipulation of tabular data.

**2. Basic Operations:**
* **Indexes:** Label-based indexing for accessing data.
* **Pandas broadcasting:** Applying operations over DataFrames.
* **pd.read():** Reads data from files into DataFrames.
* **head():** Displays the first few rows of a DataFrame.
* **shape:** Returns the dimensionality of a DataFrame.
* **Set options:** Adjusts display settings like the number of rows and columns.
* **Limit:** Pandas is best suited for datasets of up to 5-10 million rows; for larger datasets, PySpark or Apache architectures are recommended.

**3. Useful Methods:**
* **unique():** Returns unique values in a column.
* **normalize=True:** Normalizes data for better interpretability.
* **info():** Provides information about the DataFrame's structure.
* **df.values:** Returns the data as a NumPy array.
* **describe(include='all'):** Summarizes data statistics, including all columns.
* **select_dtypes(include/exclude):** Selects specific data types in the DataFrame.
* **pd.date_range(start, period, freq):** Creates a range of dates.
* **apply():** Applies a function along an axis of the DataFrame.
* **sum(axis=0/1):** Computes the sum along a specific axis.
* **loc vs. iloc:** Explicit vs. implicit indexing for data access.

**4. Merging and Concatenation:**
* **pd.merge(on='column'):** Merges DataFrames on a common column.
* **pd.concat:** Concatenates two or more DataFrames.
* **Sorting:**
    * **sort_values():** Sorts DataFrames by specified values.
    * **reset_index():** Resets the index of a DataFrame.
* **group_by():** Groups data by specific columns.

## Matplotlib (Math Plotting Library):
**1. Plotting:**
* **np.linspace(0, 10, 5):** Generates linearly spaced values for plotting.
* **plt.figure(figsize=(3,4)):** Creates a figure with specified dimensions.
* **plt.subplot(2,1,1):** Creates a subplot in a figure.

**2. Subplotting:**
* **fig, ax = plt.subplots(2):** Creates multiple subplots.
* **grid:** Adds a grid to the plot.
* **plt.axis('equal'):** Ensures equal scaling for both axes.