# Introduction and Overview

In this module, you will learn how to navigate Jupyter Notebooks. You will create both code cells for running your code and markdown cells for documentation and notes. Afterwards, you will get a quick introduction to the two Python modules NumPy and Pandas. You will use these libraries to handle and explore the data of your machine learning projects. Let's get started!

<b>Functions and attributes in this lecture: </b>
- `numpy:` - NumPy package with alias `np`
 - `np.array()` - Creates a NumPy array from a list.
 - `.shape` - Gives the shape attribute of an NumPy array.
 - `.size` - Gives you the size of the NumPy array.
 - `.sqrt` - Gives you the square root of each of the elements of the array.
- `pandas:` - Pandas package with alias `pd`
 - `pd.DataFrame()` - Create a Pandas dataframe from e.g. a Python dictionary.
 - `pd.read_csv()` - Reads a dataframe from a CSV file.
 - `.describe()` - A method for giving statistical information about the dataframe.
 - `.info()` - A method for giving basic information about the dataframe.
 - `.value_counts()` - A method for counting the different catogories.
 - `.drop()` - A method for dropping a column.
 - `.corr` - A method for finding the correlation matrix.
 - `.head()` - A method for displaying the first rows of a dataframe.
 - `.plot()` - A method for displaying a line plot of the dataframe.
 - `.hist()` - A method for displaying a histogram of the dataframe.

## Introduction to Jupyter Notebook

Here you will get familiar with Jupyter Notebooks!

In [None]:
# A basic code cell


This is a markdown cell. Here you can write markdown text to create <b>great</b> notes! You can also write mathematics here with $$\sum_{n = 1}^{m} \frac{1}{n}.$$ To learn more about the markdown syntax, check out the following <a href="https://www.markdownguide.org/basic-syntax">Markdown Tutorial</a>.

In [None]:
# Getting help


## Short Recap for NumPy and Pandas

### NumPy

NumPy is a Python library for working efficiently with vectors and matrices. The main datatype in NumPy is the <b>ndarray</b>, which represents an n-dimensional array. For us, we will use NumPy for inputs and outputs to machine learning models. To learn more about NumPy, you can take a look at the <a href="https://numpy.org/">NumPy Documentation</a>.

In [None]:
# It is convention to import numpy with the alias np


#### 1-D Array

In [None]:
# Create two numpy 1D arrays


In [None]:
# Display a


In [None]:
# Getting out the second element


In [None]:
# Compute their sum


In [None]:
# Pointwise multiplication


In [None]:
# Multiply by numbers


In [None]:
# The square root of each enery


In [None]:
# The size (how many entries) of an array


#### 2D-Array

In [None]:
# Create a 2D array (matrix)


In [None]:
# Display the matrix


In [None]:
# Get the second row


In [None]:
# Get the second column


In [None]:
# Get one element in the matrix


In [None]:
# Multiply by numbers


In [None]:
# Add the matrices


In [None]:
# Checking the shape of an ndarray


In [None]:
# Getting the total size of the ndarray


### Pandas

Pandas is another Python library for dealing with missing values and data exploration. The main datatype in Pandas is the <b>dataframe</b>, which represents data that has different features of various types. For us, we will use Pandas to clean the data before using it in awesome machine learning models. To learn moore about Pandas, you can take a look at the <a href="https://pandas.pydata.org/docs/user_guide/index.html">Pandas Documentation</a>.

In [None]:
# It is convention to import pandas with the alias pd


In [None]:
# Manually creating a dataframe


In [None]:
# Displaying the dataframe


In [None]:
# Selecting a column


In [None]:
# Download the iris dataset


In [None]:
# Display the dataset


In [None]:
# Display the first 10 rows


In [None]:
# Statistical info


In [None]:
# General info about all columns


In [None]:
# Count each value in a column


In [None]:
# Giving out the correlation matrix


In [None]:
# Trying to drop a column


In [None]:
# To make it permanent without a new assignment, we use inplace=True


In [None]:
# Line plot of a single feature


In [None]:
# Line plot of all the features


In [None]:
# Scatter plot of sepal_width versus sepal_length


In [None]:
# Histogram of a single feature


In [None]:
# Histogram of all the features
