# Lab 1: Introduction & Set up
## Data Structures & Algorithms
### Week 1 
*12-13/02/2025*

change here

## Today

* [Some preliminary stuff for the DS&A labs.](#prelim)
* [Helpful tools](#tools)
* [Setting up a new project...](#setup)
* [Good coding practice :)](#goodcoding)
* [Before you get started](#getstarted) (optional)
* [Let's start coding!](#exercises)

# Some preliminary stuff. <a class="anchor" id="prelim"></a>

### How will these labs be structured?

I will try to make the lab materials available 24-48 hrs in advance for those who wish to familiarise themselves with the content in advance (entirely optional, not-mandatory!)

Labs themselves will be 90 minutes in total, divided into:
1. Taught content for the week, as well as refresher on content covered in the lectures / previous labs,
2. Exercises (individual / collaborative - either OK!),
3. Any other coding-related work & questions, **if there is time**. Otherwise, reach out to me over email: 228755@students.hertie-school.org

### Lab 'rules' / advice

- There are **no stupid questions.**

- **Debugging is part of coding.** If your code doesn’t work, that’s normal! Learn to read error messages, print intermediate outputs, and systematically isolate problems. Debugging is a key skill—not an obstacle.

- **Try it yourself** before asking others / Google / Stack Overflow / ChatGPT (/ LLM of your choice)
    - Especially for the ChatGPT route, I highly recommend reading the **full output** explaining what was wrong with the input code and what ChatGPT is changing rather than blindly copying the provided code chunk. This is both a great way to learn, and also helps you as a coder to proactively identify instances where ChatGPT has got stuck.

- **Version control is your friend.** Use Git (even locally) to track changes and avoid catastrophic losses. Commit often with meaningful messages.

- We are a heterogeneous class with varying levels of comfort and experience in coding - specifically in Python. As with Maths for Data Science, my approach is to try to provide assistance that is most useful to as many of you as possible. This means, if we are covering material you are already comfortable with, please feel free to sit nearer the back of the lab and work at your own pace. I do **expect everyone to go through the materials at least once**, to make sure there's no knowledge gaps, but I don't intend to waste time for those students with more advanced skills. If that's you, please treat these as a useful refresher.

- The final result of this course is delivery of a Flask Web App - that's no small thing for a non-technical intake, appreciate that this is a stretch and that by the end you will have concrete and professionally-relevant skills that you did not have beforehand.

- **Enjoy the class** and appreciate that's it's a challenge - writing good code is fiddly, frustrating & satisfying!


### Access lab resources and exercises
You will find the Jupyter notebooks for the labs at [https://github.com/henrycgbaker/data_structures_and_algorithms_2025](https://github.com/henrycgbaker/data_structures_and_algorithms_2025). 

To access the Jupyter notebooks, the easiest thing is to `git clone` the repository. To do this, you'll need to install Git (if you haven't already done so for IDS) and learn some basic terminal commands.

You can open the terminal by searching for terminal (Mac), or open Command Prompt or Git Bash. (Windows)

### Installing Git
#### For Windows:
1. Visit [git-scm.com](git-scm.com) and download the Windows installer.
2. During installation, select "Git from the command line" when prompted.
3. After installation, open Command Prompt (search for "cmd") and verify the installation: ```git --version``` in command line.

#### For Mac: 

1. Install Homebrew (if you don't have it): ```/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"```
2. Install Git using Homebrew: ```brew install git``` 
*Alternatively, you can download Git directly from [git-scm.com](git-scm.com).*
3. Verify installation: by running ```git --version```

---
### Cloning the Repository:

Next we will need to clone the repository into our local machine. This can be done by navigating within the terminal to the desired directory and typing the ```git clone``` command. 

1. In-terminal, navigate to where you would like to set up your repository (I recommend that you create a directory called ```repositories``` either at the root directory, or within your documents directory).  
*Useful commands can be found in the [Helpful Tools](#helpful-tools) section below.*
2. Clone the repo:

```
git clone https://github.com/henrycgbaker/data_structures_and_algorithms_2025.git

```
This will create a directory (in whichever folder your terminal is currently operating in) and copy the remote repository into your local machine. 

---

### Running Jupyter Notebooks:
Once you've cloned the repository, follow the guide in the [Before you get started](#getstarted) section for a step-by-step walkthrough on how to run Jupyter Notebooks.

📌 Note: All lab materials will be distributed via GitHub, not Moodle. Check the repository regularly for updates (syncing it will download the new material each week).

## Helpful tools <a class="anchor" id="tools"></a>

Here is a list of tools and resources that can help you along the way. Use these to go through the steps in the [Before you get started](#getstarted) section.

* **How to work with the command line?** It's useful for anyone working with data and code to know how to use the command line (aka terminal/shell), this [article](https://www.dataquest.io/blog/why-learn-the-command-line/) explains why. [Here](https://tutorial.djangogirls.org/en/intro_to_command_line/) is an introduction of the most important commands on different operating systems. Using [TMUX](https://deliciousbrains.com/tmux-for-local-development/#what-is-tmux) will make using the command line much easier! 
* **How to install Python and keep track of dependencies?** I highly recommend using a virtual environment, ideally with [miniconda](https://docs.anaconda.com/free/miniconda/#quick-command-line-install)! Miniconda [CHEATSHEET](https://docs.conda.io/projects/conda/en/latest/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf)
* **Where/how to write your code?** Choose an integrated development environment (IDE) or code editor (I recommend [VSCode](https://code.visualstudio.com/) or [PyCharm](https://www.dataquest.io/blog/how-to-set-up-pycharm-community-edition/)) and install [Jupyter Notebooks](https://realpython.com/jupyter-notebook-introduction/) for code experimentation (and to run the DS&A labs), and use Jupyter [keyboard shortcuts](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330).
* **How to keep track of changes?** [Download and install Git](https://www.atlassian.com/git/tutorials/install-git)! Git [CHEATSHEET](https://education.github.com/git-cheat-sheet-education.pdf)
* **How to collaborate?** Sign up to GitHub.
* **How to write basic syntax in Python?** Look at this [CHEATSHEET](https://www.pythoncheatsheet.org/cheatsheet/basics) or use StackOverflow.

# Setting up a new project <a class="anchor" id="setup"></a>

Here are some best-practice steps you can take *every time* you start a new project (including a 'project' for what we'll be doing in DS&A labs), to keep your code organised, sharable, and up-to-date. The [Before you get started](#getstarted) section below goes through these steps one by one.

1. **Create a new directory** for each project! (I recommend having a dedicated `repositories` folder near root directory)
2. **Set up a virtual environment** with your preferred Python version using conda.
3. **Install jupyter** into this environment by running the command: ```conda install jupyter```
4. **Set up a new project in your IDE (VSCode / PyCharm)**, selecting your preferred Python installation as the interpreter (ideally the one you just created using conda).
5. **Set up git** (create .gitignore, initialise, first commit, etc.).
6. **Start jupyter** within your environment by running the command: `jupyter notebook` (or open the `.pynb` file with your preferred IDE)

# Refresher: some best practice for coding <a class="anchor" id="goodcoding"></a>

The course textbooks are a great resource and there are many blog posts on best-practice for coding. Here's a very top-line summary:

* Think carefully about naming conventions for variables, classes and functions
* Write good documentation and comments
* Make sure your code is reusable and scalable
    * Once you find yourself repeating a bit of code, write a function for it.
    * High cohesion (code *within* modules/classes/functions should be closely related).
    * Low coupling (code in *different* modules/classes/functions should depend on each other as little as possible).
* Test your code (the smaller the units you test, the easier you will make your life in the future).
* Track your changes; remember to use version control so that your collaborators and future self understand what you have changed/added!

# Before you get started <a class="anchor" id="getstarted"></a>

If you have been coding in Python for a while and already have your preferred set-up, you can skip this section and go on to the coding exercises. Otherwise, go through the following steps to create a handy and easily reproducible coding environment, before you dive into the coding exercises. Doing this once will very likely make your coding experience in the future much easier, even if it seems like a hassle in the beginning. **Use the resources and links listed in the [Helpful tools](#tools) section.**

### 1. Familiarise yourself with the command line
Being familiar with the command line, and being used to working with it, will help you set up and navigate data processing pipelines, work with data that is stored remotely (i.e. not on your local computer), switch between different programmes, and deploy web apps (something we'll be doing by the end of this course). 

It's a little unwiedlely at first and not necessarily intuitive, but essential and worth putting in a little time initially to get familiar. If you have never used the command line before, open the command line introduction and make yourself familiar with the commands to:
1. how to print the information of the current directory: `pwd` 
    - "print working directory".
    - outputs the full path to the current directory you're working in.
    - an absolute path
    - e.g. `/home/user/projects`
2. move between directories (aka folders)" `cd`
    - "change directory"
    - `cd repositories`
    - `cd ..`
    - `cd ~` (`~` represents home dir)
    - absolute vs relative paths
3. how to print a list of files and subdirectories within the current directory: `ls`
    - "list"
    - `ls -l` - detailed information (like permissions, sizes, and timestamps)
    - `ls -a` - list files and folders including hidden ones (those starting with .)
    - NB: can combine flags: `ls -la`
4. create new directories: `mkdir`
    - "make directory"
    - `mkdir dir1 dir2 dir3`
    - `mkdir -p parent_dir/child_dir/grandchild_dir` (directory with nested subdirectories, using `-p` flag)
    - NB: `-p` ensures all parent directories are created if they don’t exist; without `-p`, you would get an error if parent_dir didn’t already exist.
5. remove files and folders: `rm`
    - "remove"
    - remove file: `rm filename`
    - remove (empty) directory `rmdir directory_name`
    - remove directory w/ its contents: `rm -r directory_name` (`-r`: recursive flag)
6. (optional and slightly more advanced) a really helpful tool for simplifying the command line is TMUX (terminal multiplexer), so you can run multiple terminal (command line) windows in parallel.

### 2. Set up a tool for virtual environments
Virtual environments are a great way to separate your package dependencies and even your Python versions. We'll be using a dedicated virtual environment for this course as well as ML (and future Hertie techical courses).

I recommend miniconda (called `conda`) -- since it is not limited to Python; you can even set up an R environment with conda. 
    - *Alternatives are `virtualenv` and `pipenv`.* 
- If you already work with virtual environments, you can skip this step. Otherwise:

1. **install Miniconda**:
    - either from the [webpage](https://docs.anaconda.com/miniconda/install/)
    - or using basic command line: 
        - Mac`bash Miniconda3-latest-MacOSX-x86_64.sh` 
        - Windows `Miniconda3-latest-Windows-x86_64.exe`
    - follow the prompts (accept default install location)
    - initialise: `source ~/.bashrc`
    - verify: `conda --version`
2. **create a new virtual environment**: in which you install latest Python version, give it the name `dsa`: `conda create -n dsa python=3.11`
3. **verify**: by running the conda command that lists all environments: `conda env list`
4. **activate the new environment** that you created: `conda activate dsa`
5. **list the packages** that are installed within this environment: `conda list`


### 3. Create a new Git repo (or clone an existing one)

Using version control with git will make your collaborators' and future self's life *a lot* easier. It helps you to track your own changes on a project and to collaborate effectively with your teams. Generally, this part of the work flow will consist of creating a new repo or cloning an existing one from Github. 

The steps are:

1. **install git**: (run `git --version` to check if you already have it installed)
2. **clone the existing DS&A labs repository**: such as the one for these labs, run `git clone https://github.com/henrycgbaker/data_structures_and_algorithms_2025.git` and then move into the new directory that this creates, 
3. **branch**: now create and move into a new branch by running `git checkout -b lab1` (where lab1 is now the name of the branch you have created)

### 4. Set up Jupyter Notebooks
- Jupyter notebooks are a great way for 
    - quick experimentation with code, 
    - to present code, 
    - or to create data science work flows. 
- *NB: The final functions and classes (and testing) that you write for a project should **not** sit in Jupyter notebook. Those should be written in `.py` files (aka Python modules) or, even better, in Python packages. We'll get to this in a few weeks...*
    - *One best-practice tip: start your code experiments in a Jupyter notebook and once you find yourself using some functions/classes repeatedly or think you might need them in other notebooks, migrate them into Python modules (and later to a package).*

To run this Jupyter notebook, follow these steps:
1. make sure the `dsa` environment you created in step 2 is activated
2. install jupyter into this environment using the `pip install jupyter` command (installing something *into* an environment just means that you need to run the installation command after having activated the environment)
3. now move into the directory for the DS&A repository that you cloned in step 3 (called `data_structures_and_algorithms_2025`).
4. run `jupyter notebook`; this will start your default browser (or open a new tab) and you should now see a folder structure that includes the Jupyter notebook file `lab1.ipynb`
5. click on `lab1.ipynb` to run it and familiarise yourself with the keyboard shortcuts for running cells, creating new cells (above and below the current cell), and removing cells.

### 5. Set up your IDE
- After step 4., you can now write code in Jupyter notebooks. 
- For the type of coding necessary for larger (and collaborative) projects, you'll also need an IDE (integrated development environment). IDEs are easier to use than base Jupyter notebook - as run from the command line above.
- This is where you'll write classes, functions, helper functions, tests etc. 
- I recommend VS Code or PyCharm, because it they have many 'intelligent code' features such as code prediction, readability, error detection, easy refactoring, and because it provides debugging tools.
- You can either open a project from the IDE, or open the file using the IDE programme. 
    - The 'more correct' way is to open up the project from the IDE.
    - **For VS Code**: 
        - click the `Explorer` icon on the top left hand corner
        - select `Open Folder`
        - select the `Interpreter` (top right, also the central top bar), choose the Python installation that is within the `dsa` conda environment you created abov
    - **For PyCharm**
        - on the `New Project` screen, set the `Location` path to the path to the `data_structures_and_algorithms_2025` directory 
        - change the radio button from `New encironment using ...` to `Previously configured interpreter`
        - in the `Interpreter` dropdown menu, choose the Python installation that is within the `dsa` conda environment you created above

## Let's get coding! <a class="anchor" id="exercises"></a>

### Exercise 1
Write a function that prints a string to the screen.

In [12]:
def print_string(x):
    """
    Print the input string
    Parameters
    ----------
    x : a string
    """

    # Implement me

### Exercise 2
Write a function that takes a name of a person as an input and prints 'Hello ', followed by the name.

In [17]:
def greet_someone(x):
    """
    Print 'Hello ' followed by the input string

    Parameters
    ----------
    x : a string
    """

    # Implement me

EXTENSION

Try to modify the previous function in a few ways (one step at a time):

* ensure that the first letter of the name is capitalised when printed
* when the names that are passed are 'Margaret' or 'Henry', they should be greeted with 'Dearest ' instead of 'Hello '
* include some more code that checks if the input is a string; if it isn't a string, the function should return `None`. *Hint: Look up how to use the `isinstance()` function; think about where in the function this should go!*
* now rewrite this again, this time make the function raise an error if the input is not a string. *Hint: Look up how to deal with a `TypeError` exception in Python.*

### Exercise 3
Write a function that takes as an input two numeric values `a` and `b` and returns their sum.

In [21]:
def sum_two_values(a, b):
    """
    Sum up two numeric values

    Parameters
    ----------
    a : a numeric value
    b : a second numeric value
    """

    # Implement me

EXTENSION

Try to modify the previous function to:

* raise an error if one of the two values is NOT a number. *Hint: Check for the types `int` and `float`.*

### Exercise 4
Write a function that takes an integer x > 1 as an input and returns the sum of all integers 1 to x (including x).

In [44]:
def sum_integers(x):
    """
    Return the sum of integers 1 to x.

    Parameters
    ----------
    x : an integer
    """

    # Implement me

EXTENSION

Write an alternative version of this function called `sum_integers2` using something called 'list comprehension' and the built-in `sum` function.

*TIP: The general syntax for list comprehension is:*
```
[expression for item in iterable if condition]
```

### Exercise 5
Write a function that checks if an integer is a multiple of another integer.

In [29]:
def is_multiple(a, b):
    """
    Return True if integer a is a multiple of integer b.

    Parameters
    ----------
    a : an integer
    b : another integer
    """

    # Implement me

### Exercise 6
Use the function written in Exercise 5 to write a function that checks if an integer is even.

In [32]:
def is_even(x):
    """
    Return True if integer x is even.

    Parameters
    ----------
    x : an integer
    """

    # Implement me

### Exercise 7
Write a function to determine the sum of all the multiples of 3 or 5 below 1000. *Hint: You could, again, use the function written in Exercise 4 and adapt it using the function written in Exercise 5.*

In [None]:
def calculate_sum_of_multiples(x):
    """
    Return the sum of multiples of 3 and 5 below the input value x. 

    Parameters
    ----------
    x : an integer
    """
    # Implement me

EXTENSION

Can you do this in one line?

### Exercise 8
Determine the 20th Fibonacci number. For this, write a function to return the nth term of the Fibonacci sequence, with the first two terms being $x_0 = 0$ and $x_1 = 1$. The Fibonacci sequence continues by adding the previous two terms, so with these two starting values, the first few terms of the sequence are $0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...$.

In [58]:
def fibonacci(n):
    """
    Return the nth value in the fibonacci sequence.

    Parameters
    ----------
    n : an integer
    """

    # Implement me

### Exercise 9
Determine the sum of the even terms in the Fibonacci sequence (below 4 million). *Hint: Write a function that sums even terms below some integer n. Can you use the functions you implemented for Exercises 6 and 8?*

In [61]:
def sum_even_fibonacci(x_max):
    """
    Sum even fibonacci numbers below x_max.

    Parameters
    ----------
    x_max : an integer
    """

    # Implement me