<div align="center">
<h3> CS 178: Machine Learning & Data Mining </h3>
<h1> Discussion, Week 0 </h1>
</div>

---
## Part 1 : Setting up Your Conda Environment

### What is Conda?

[Conda](https://docs.conda.io/en/latest/#) is an open-source package and environment manager for Python. Using Conda, we can easily install, update, or remove various Python packages. One of the key features of Conda is that it lets us create separate environments, which allows us to install different packages (or even different versions of the same package) for different projects. For example, if we are working on a project which requires Python 2.7 and we're also working on a separate project which requires Python 3.10, we can maintain a separate Conda environment for each project in order to easily switch between the two.

If you have used Python before, you've probably used pip to install packages. For a comparision between pip and Conda, check out [this blog post](https://www.anaconda.com/blog/understanding-conda-and-pip).

### Installing Python and Conda via Miniconda
In this tutorial, we wil use [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to install Python and Conda. 
1. Download the correct installer for your system [here](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links), and follow the instructions to install Miniconda. 

There is also [Anaconda](https://www.anaconda.com/), which is a distribution of Conda that comes with many popular data science packages in addition to Conda. It doesn't matter too much which one you use (see [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/download.html#anaconda-or-miniconda) for some guidelines), but in this tutorial we'll be using Miniconda.

###  Creating a Conda Environment

Once you've installed Conda, open up a terminal (on Linux / Mac) or the Anaconda Prompt (on Windows). We'll now set up an environment for CS178 and install some necessary packages.

1. Let's first verify that conda is installed correctly. In your terminal, run the command `conda --version` and verify that the output is something like `conda 23.7.2` or higher.
2. Now, we'll create a new Conda environment named `cs178` with the latest version of Python (3.10). To do this, run the command `conda create --name cs178 python=3.10`. If you ever forget the name of an environment you've created, you can list all of your environments with `conda env list`.
3. To use our new environment, we must first activate it with `conda activate cs178`. The name of the active environment should now be displayed in front of your prompt in parentheses. An environment can be deactivated using `conda deactivate`.
4. Let's install some packages in our `cs178` environment. First, run the command 
  `conda config --env --add channels conda-forge`. 
This command tells Conda to search [Conda Forge](https://conda-forge.org/docs/user/introduction.html) for packages. Next, Run the following command to install some packages: `conda install matplotlib pandas jupyterlab`.
5. Now, run `conda install scikit-learn otter-grader`. This will install the package `scikit-learn` and `otter-grader`. Those are essential packages for Homework 1. 
6. You can use the command `conda list` to show all of the packages installed in your environment -- if everything has worked so far, you should see the packages we installed (scikit-learn, pandas, ...) plus their (many) dependencies.

**Note**: If you are unable to install certain packages using conda for any reason, you can also choose to install them using pip. The packages will still be installed in the current conda environment. However, there may be potential conflicts with packages managed by conda. Therefore, please use conda to manage all packages unless necessary. You can install packages with pip by running `pip install <package-name>` after activating your conda environment.

Congratulations! You've now set up your Conda environment with all of the necessary packages to complete the assignments in CS178. In the next section, we'll tinker around with some of the packages we've installed to better familiarize ourselves with our new tools.

### Additional Resources
Note that we've only covered the bare essentials of Conda. If you'd like to read more, here are some resources:
- [Conda User Guide](https://conda.io/projects/conda/en/latest/user-guide/index.html)
- [Conda Cheat Sheet](https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html)
- [Conda vs Pip](https://www.anaconda.com/blog/understanding-conda-and-pip)
- [Openlab Jupyterhub @ ICS](https://wiki.ics.uci.edu/doku.php/virtual_environments:jupyterhub): This is a Jupyterhub service provided by the ICS department. It can be accessed on campus or via VPN with your ICS account credentials. Most of the machine learning related packages are already installed here, but you can install additional packages yourself using the Terminal feature in the Jupyterlab interface if needed.



<center> <img src="http://sli.ics.uci.edu/extras/sep.png" alt="--------------------------------------------" width="200px" height="20px" style="width:200px;height:20px;"/> </center>

## Part 2: Jupyter Notebooks and Numpy

### Jupyter Notebooks

In the previous section, we installed a package called `jupyterlab`. This package lets us use **Jupyter Notebooks**. A notebook is a web-browser based tool that combines code, text, equations, and much more into a single document. Let's create a notebook and get a feel for how these work.

1. In your terminal, `cd` into the directory where you'd like to create your notebooks, and run `jupyter lab`. This should automatically open a tab in your browser. From here, we can create a new blank notebook (under "new"). Make sure you have our `cs178` environment active!

A notebook consists of many **cells**. Each cell can either contain Markdown (like this cell!) or code. In Markdown cells, we can use standard [Markdown syntax](https://www.markdownguide.org/cheat-sheet/) -- for example, we can make text **bold**, *italics*, make lists, write `code`, etc. 

Markdown cells additionally support mathematical equations via LaTeX. For example, inline math can be written by wrapping LaTeX code in`$[...]$` and display-mode math can be written by wrapping LaTex code in `$$[...]$$`. For example, here's some inline math: $a^2 + b^2 = c^2$, and here's some display-mode math: $$\int x^2 d x = \frac{1}{3} x^3 + C.$$ If you'd like to learn more about LaTeX, check out this [short tutorial](https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes). **LaTeX is a very useful tool for typesetting math equations, but using LaTeX isn't a requirement in this course.**

In code cells, we can type Python code as usual. We can also execute individual cells in order to run the code they contain.

In [1]:
a = 3
b = 5
c = a + b
print(c)

8


### Numpy Arrays

Let's now familiarize ourselves a little bit with Python and Numpy. We start by importing some packages that we'll be using.

In [2]:
import numpy as np
from sklearn.datasets import load_iris

First, we'll load in the Iris dataset and store the features in a numpy array `X` and the labels in the numpy array `y`.

In [3]:
iris = load_iris()
X = iris.data
y = iris.target

#### Printing 
We can print `X` and `y` to see their contents.

In [4]:
X

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

In [5]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

#### Shape
We can see the shape of a numpy array by using `.shape`. Here, we see that `X` is a numpy array with 150 rows and 4 columns, and `y` is a numpy array with 150 entries.


In [6]:
X.shape

(150, 4)

In [7]:
y.shape

(150,)

#### Indexing.

We can access specific elements of numpy arrays by *indexing*. There's various ways of doing this, each of which is useful in different situations. It is important to be familiar with all of these.

##### Basic Indexing.

The simplest way of indexing a numpy array is by specifying integers corresponding to which entry we want. We can also access particular rows/columns of our numpy array.

In [8]:
alist = [[1,2],[2,3]]

In [9]:
# Python array can only be indexed in 1-dimensional manner
# Last line will raise TypeError
print(alist[0])
alist[0,0]

[1, 2]


TypeError: list indices must be integers or slices, not tuple

In [10]:
# Numpy array can be indexed with multiple dimensions together
# Gets the entry of X in row 1 and column 2 -- remember that Python is zero-indexed!
X[1, 2]

1.4

In [11]:
# Gets the 7th entry of y
y[6]

0

In [12]:
# Get the first row of X
X[0, :].shape
# Or just X[0] would also work

(4,)

In [13]:
# Get the first column of X
X[:, 0]

array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8,
       4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5. ,
       5. , 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5. , 5.5, 4.9, 4.4,
       5.1, 5. , 4.5, 4.4, 5. , 5.1, 4.8, 5.1, 4.6, 5.3, 5. , 7. , 6.4,
       6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5. , 5.9, 6. , 6.1, 5.6,
       6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7,
       6. , 5.7, 5.5, 5.5, 5.8, 6. , 5.4, 6. , 6.7, 6.3, 5.6, 5.5, 5.5,
       6.1, 5.8, 5. , 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 6.3, 5.8, 7.1, 6.3,
       6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5,
       7.7, 7.7, 6. , 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2,
       7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6. , 6.9, 6.7, 6.9, 5.8,
       6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9])

##### Setting elements of an array.

Using indexing, we can set the value of a specific element in a numpy array.

In [14]:
# Sets the 6th entry of y to 1
y[6] = 1 

In [15]:
y

array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

##### Indexing by Slicing.

Slicing lets us access multiple contiguous rows/columns.

In [16]:
# Get the first 3 rows of X
X[:3, :]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2]])

In [17]:
# Get rows 5, 6, 7 -- note row 8 is not included
X[5:8, :]

array([[5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2]])

In [18]:
# Get rows 5, 6, 7 of X and columns 1, 2
X[5:8, 1:3]

array([[3.9, 1.7],
       [3.4, 1.4],
       [3.4, 1.5]])

##### Negative Indexing.

You can also use negative indexes to count from the end of the array.

In [19]:
X[-1, :]   # Get the last row

array([5.9, 3. , 5.1, 1.8])

In [20]:
X[-2:, :]  # Get the last two rows

array([[6.2, 3.4, 5.4, 2.3],
       [5.9, 3. , 5.1, 1.8]])

In [21]:
X[:, :-1]  # First three columns

array([[5.1, 3.5, 1.4],
       [4.9, 3. , 1.4],
       [4.7, 3.2, 1.3],
       [4.6, 3.1, 1.5],
       [5. , 3.6, 1.4],
       [5.4, 3.9, 1.7],
       [4.6, 3.4, 1.4],
       [5. , 3.4, 1.5],
       [4.4, 2.9, 1.4],
       [4.9, 3.1, 1.5],
       [5.4, 3.7, 1.5],
       [4.8, 3.4, 1.6],
       [4.8, 3. , 1.4],
       [4.3, 3. , 1.1],
       [5.8, 4. , 1.2],
       [5.7, 4.4, 1.5],
       [5.4, 3.9, 1.3],
       [5.1, 3.5, 1.4],
       [5.7, 3.8, 1.7],
       [5.1, 3.8, 1.5],
       [5.4, 3.4, 1.7],
       [5.1, 3.7, 1.5],
       [4.6, 3.6, 1. ],
       [5.1, 3.3, 1.7],
       [4.8, 3.4, 1.9],
       [5. , 3. , 1.6],
       [5. , 3.4, 1.6],
       [5.2, 3.5, 1.5],
       [5.2, 3.4, 1.4],
       [4.7, 3.2, 1.6],
       [4.8, 3.1, 1.6],
       [5.4, 3.4, 1.5],
       [5.2, 4.1, 1.5],
       [5.5, 4.2, 1.4],
       [4.9, 3.1, 1.5],
       [5. , 3.2, 1.2],
       [5.5, 3.5, 1.3],
       [4.9, 3.6, 1.4],
       [4.4, 3. , 1.3],
       [5.1, 3.4, 1.5],
       [5. , 3.5, 1.3],
       [4.5, 2.3

##### Indexing with Arrays.

We can also access non-contiguous parts of a numpy array by specifying a list (or numpy array) of indexes.

In [22]:
# Gets rows 1, 5, 9 from X
rows = [1, 5, 9]
X[rows, :]

array([[4.9, 3. , 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.9, 3.1, 1.5, 0.1]])

In [23]:
# Gets first and last column from X
cols = [0, -1]
X[:, cols]

array([[5.1, 0.2],
       [4.9, 0.2],
       [4.7, 0.2],
       [4.6, 0.2],
       [5. , 0.2],
       [5.4, 0.4],
       [4.6, 0.3],
       [5. , 0.2],
       [4.4, 0.2],
       [4.9, 0.1],
       [5.4, 0.2],
       [4.8, 0.2],
       [4.8, 0.1],
       [4.3, 0.1],
       [5.8, 0.2],
       [5.7, 0.4],
       [5.4, 0.4],
       [5.1, 0.3],
       [5.7, 0.3],
       [5.1, 0.3],
       [5.4, 0.2],
       [5.1, 0.4],
       [4.6, 0.2],
       [5.1, 0.5],
       [4.8, 0.2],
       [5. , 0.2],
       [5. , 0.4],
       [5.2, 0.2],
       [5.2, 0.2],
       [4.7, 0.2],
       [4.8, 0.2],
       [5.4, 0.4],
       [5.2, 0.1],
       [5.5, 0.2],
       [4.9, 0.2],
       [5. , 0.2],
       [5.5, 0.2],
       [4.9, 0.1],
       [4.4, 0.2],
       [5.1, 0.2],
       [5. , 0.3],
       [4.5, 0.3],
       [4.4, 0.2],
       [5. , 0.6],
       [5.1, 0.4],
       [4.8, 0.3],
       [5.1, 0.2],
       [4.6, 0.2],
       [5.3, 0.2],
       [5. , 0.2],
       [7. , 1.4],
       [6.4, 1.5],
       [6.9,

##### Logical Indexing.

We can perform logical operations on an array, and use the results to index the array.

In [24]:
import numpy as np

In [25]:
Z = np.random.rand((10))  # A numpy array with 10 random elements

In [26]:
Z

array([0.51607579, 0.94578226, 0.88340215, 0.54643114, 0.62716546,
       0.95549804, 0.60167512, 0.63303278, 0.00324649, 0.78075895])

In [27]:
# Check where entries are larger than 0.5
M = Z>0.5
M

array([ True,  True,  True,  True,  True,  True,  True,  True, False,
        True])

In [28]:
# Z[Z>0.5] will also work
Z[M]

array([0.51607579, 0.94578226, 0.88340215, 0.54643114, 0.62716546,
       0.95549804, 0.60167512, 0.63303278, 0.78075895])

In [29]:
~M

array([False, False, False, False, False, False, False, False,  True,
       False])

In [30]:
# equivalent to Z[Z<=0.5]
Z[~M]

array([0.00324649])

Indexing an array with boolean values (such as `Z > 0.5`) selects out the sub-array of elements whose index is ``True``.  On the other hand, indexing an array with a list or array of integers selects out the elements in the order provided by the list.

<center> <img src="http://sli.ics.uci.edu/extras/sep.png" alt="--------------------------------------------" width="200px" height="20px" style="width:200px;height:20px;"/> </center>

## Part 3: Questions?

### Additional Resources
If you'd like to read more about Markdown, LaTeX, Jupyter Notebooks, or Python/Numpy, here are some useful resources:
- [Markdown Cheatsheet](https://www.markdownguide.org/cheat-sheet/)
- [Learn LaTeX in 30 Minutes](https://www.overleaf.com/learn/latex/Learn_LaTeX_in_30_minutes)
- [Jupyter Notebook Tutorial](https://www.dataquest.io/blog/jupyter-notebook-tutorial/)
- [A More In-Depth Python Tutorial](https://cs231n.github.io/python-numpy-tutorial/)
- [Numpy QuickStart](https://numpy.org/doc/stable/user/quickstart.html)
- [SciPy Lectures](http://scipy-lectures.org/)