# Self-evaluate your scientific Python skills

![](https://img.shields.io/badge/Python->3.11-blue)
![](https://img.shields.io/badge/Last_update-Sep._2025-yellow)
![](https://img.shields.io/badge/Status-available-green)
[![](https://img.shields.io/badge/Institution-IPGP-red)](https://www.ipgp.fr/)
[![](https://img.shields.io/badge/Contact-seydoux@ipgp.fr-red)](mailto:seydoux@ippg.fr)


## 1. Introduction

This notebook is a self-assessment to help you determine whether you have the Python skills needed to succeed in this course. It contains a series of questions that you must answer by adding code cells and, if needed, markdown explanations. The exercises should feel manageable if you have the expected background. If you find them too difficult, we advise against enrolling in this course, as it requires a somehow solid understanding of Python and scientific libraries. However, if you’re unsure and would like to discuss your level and options, feel free to reach out to the teaching team—we’re happy to help. This session also introduces key resources that can assist you in solving problems throughout the course.  

- Always check the **official documentations** to understand a library's API and find usage examples. For example, the [NumPy documentation](https://numpy.org/doc/stable/) provides a comprehensive guide to the library's functions and classes. 

- **Search engines** and large language models chatbot (e.g., ChatGPT, Gemini, Claude) can provide quick answers, but always verify the information before using it. Note that we do not prevent you from using these tools, but we encourage you to use them wisely.

- **Q&A platforms** such as [Stack Overflow](https://stackoverflow.com/) where developers share solutions to programming problems posed by others. You can search for your issue and find answers to similar questions that have already been asked. While helpful, not all answers are accurate, so you'll need to always double-check. 

- Quick references for common functions in popular scientific libraries are also often available in the form of **cheat sheets**. At IPGP, a group of researchers have created a [Python scientific cheat sheet](https://ipgp.github.io/scientific_python_cheat_sheet/) that you can take time to explore. There are also library-specific cheat sheets, such as the [NumPy](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf) or the [Matplotlib](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf) cheat sheets.

Use these resources wisely to support your learning and problem-solving throughout the course. This notebook was created by Léonard Seydoux for the Earth Data Science course at the [Institut de Physique du Globe de Paris](https://www.ipgp.fr/fr). It was proofread by Geneviève Moguilny and Alexandre Fournier in 2024. If you find any mistakes despite our efforts, please let us know.

## 2. Iterables

In this course, we will frequently use the term **object**. In Python, everything is an object. An object is a data structure that contains two main components: **attributes** and **methods**. Attributes are the data stored within the object, while methods are functions that can be applied to the data.

For example, a `list` object contains a set of elements (the data), and also has a variety of methods that operate on this list. Some of these methods include `append()`, `sort()`, and `reverse()`. You can find a complete list of methods available for a `list` object in the [official Python documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).

### 2.1. Lists

A list is a collection of elements, which can be of any data type, such as integers, floats, strings, or even a mix of different types. Lists are defined using square brackets `[]`, with elements separated by commas. A key characteristic of lists is that they are **ordered**, meaning the sequence in which the elements appear is important. For instance, the list `[1, 2, 3]` is considered different from `[2, 1, 3]`. 

Additionally, lists are indexed, meaning each element has a specific position. The first element in the list has an index of `0`, the second element has an index of `1`, and so on. Negative indices can also be used, with `-1` referring to the last element, `-2` to the second-to-last, and so forth.

> **Questions**
> 1. Create a list containing the integers from 4 to 6. Store this list in a variable called `integers` and display it.
> 2. Create a list containing the first three letters of the alphabet. Store this list in a variable called `letters` and display it.
> 3. Display the second element of the `letters` list. Then, change this second element to the letter `"d"` without redefining the entire list, and display the updated list.
> 4. Combine the `integers` and `letters` lists into a new list. Choose an appropriate name for the new list and display it. What do you observe when you combine the two lists?
> 5. Multiply the `letters` list by 2 and display the result. What happens when you do this?
> 6. Try multiplying the `letters` list by itself (i.e., `letters * letters`). What do you observe? Why do you think this happens?

### 2.2. List Methods

In Python, a **method** is a function that operates on an object. For example, the method `append()` is applied to a `list` object and adds an element to the end of the list. Each object in Python has its own set of methods that can be used to modify or interact with the data inside the object.

You can find a full list of methods available for a `list` object in the [official Python documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).

> **Questions**
>
> 1. Create a list called `molecules` containing the following elements: `"CH4", "H2O", "NH3", "O3", "CO2", "CH3COOH"`.
> 2. Find and display the index of the `"H2O"` molecule in this list.
> 3. Now, given the list `"CH4", "H2O", "NH3", "O3", "H2O", "CH3COOH", "H2O"`, find and display all indices where `"H2O"` appears in the list.

### 2.3. Tuples

Tuples are similar to lists in Python, but with one key difference: **tuples are immutable**. This means that once a tuple is created, it cannot be modified—unlike lists, which can have elements added, removed, or changed. A tuple is defined using parentheses `()`, with elements separated by commas. For example, the tuple `(1, 2, 3)` contains the integers 1, 2, and 3.

> **Questions**
> 
> 1. Create a tuple called `alphabet` containing the first three letters of the alphabet. Print the second element of the tuple. 
> 1. Then, attempt to replace the second element with the letter `"d"`, **without redefining the entire tuple**, and print the result. What do you observe?
> 1. Try to add the list `letters` to the tuple `alphabet` and print the resulting object. What do you observe?

## 3. NumPy

The [NumPy library](https://numpy.org/doc/stable/) is essential for scientific computing in Python. It is developed and maintained by the NumFOCUS community. NumPy is short for "Numerical Python" and is used for working with arrays and matrices. Many other libraries, such as [Scipy](https://www.scipy.org/) (scientific computing), [Matplotlib](https://matplotlib.org/) (plotting), [Pandas](https://pandas.pydata.org/) (data analysis), and [Scikit-learn](https://scikit-learn.org/stable/) (machine learning), are built on top of NumPy. Documentation for these libraries is available on their respective websites.

In this section, we will cover the basics of NumPy and explore how to work with arrays and matrices.

In [5]:
import numpy as np

### 3.1. Warm-up

This first set of questions will help you assess your understanding of the basic functionality of the NumPy library. These exercises will focus on creating arrays using different numerical ranges and structures, which are fundamental operations in NumPy. Take your time to fully understand each operation and its output.

> **Questions**
>
> 1. Create a NumPy array called `integers` containing the integers from 0 to 14 (inclusive). Use a step of 1 between consecutive integers. 
>
> 2. Create a NumPy called `reals` containing values starting from 0 up to 100 (inclusive), but this time, with a step of 0.5 between consecutive values. Be mindful of the precision when defining the step and the endpoint of the array.
>
> 3. Create a NumPy called `angles` that ranges from $-\pi$ to $\pi$. The array should contain 30 evenly spaced values between these two limits. To do this, you may want to use a function that generates a specified number of points within a given interval.
>
> 4. Create a NumPy called `zeros` that consists of 10 elements, all set to 0. Use a specific NumPy function designed for creating arrays with constant values.

### 3.2. Basic Operations with NumPy Arrays

In this section, you will perform several basic operations using NumPy functions. These operations are fundamental to numerical computing and will help you become more familiar with how to manipulate data using NumPy arrays.

> **Questions**
>
> 1. Using NumPy functions, perform the following operations:
>    - Compute the square root of 10.
>    - Calculate the tangent, sine, and cosine of $\pi$.
>    - Compute the logarithm of 10, the natural logarithm of 10, and the exponential of 1.
>
> 2. Generate a Gaussian random vector $\mathbf{x}$ with 100 elements, where $\mathbf{x} \sim \mathcal{N}(0, 1)$. This means the vector should have a mean of 0 and a standard deviation of 1. Use the appropriate NumPy function to generate this random vector.
>
> 3. Using NumPy functions, estimate the following for the vector $\mathbf{x}$:
>    - The average of $\mathbf{x}$, denoted as $\langle \mathbf{x} \rangle$.
>    - The standard deviation of $\mathbf{x}$, calculated as $\sqrt{\langle \mathbf{x}^2 - \langle \mathbf{x} \rangle^2 \rangle}$.
>    - The minimum value of $\mathbf{x}$ and the maximum value of $\mathbf{x}$.
>
> 4. Create two five-element vectors, $\mathbf{v}_1$ and $\mathbf{v}_2$, with any values of your choice.
>
> 5. Calculate the element-wise product of $\mathbf{v}_1$ and $\mathbf{v}_2$. Then calculate the dot product $\mathbf{v}_1\cdot\mathbf{v}_2$ and the outer product $\mathbf{v}_1 \times \mathbf{v}_2$.

### 3.3. Stacking & Reshaping

In this section, you'll practice creating and reshaping arrays, as well as stacking arrays both horizontally and vertically. Make sure you are familiar with the basic array manipulation techniques before attempting these exercises. If needed, refer to the previous sections to refresh your knowledge on creating arrays and reshaping them.

> **Questions**
>
> 1. Create an array of angles ranging from $-\pi$ to $\pi$ with 30 steps.
>
> 2. Reshape the array to a 10 $\times$ 3 matrix and store it in a variable called `angles_matrix`.
>
> 3. Create `angles_stacked_horizontally`, which is a stack of the `angles` array with itself in reverse along the horizontal axis.
>
> 4. Create `angles_stacked_vertically`, which is a stack of the `angles` array with itself in reverse along the vertical axis.

### 3.4. Read a Comma-Separated Values (CSV) file with NumPy

In this section, you'll learn how to read CSV files using NumPy. CSV is a common file format used for storing tabular data, and it is widely used in data analysis. You will use the `numpy.genfromtxt` or `numpy.loadtxt` functions to import the dataset. You’ll also briefly explore some of the operations that can be done to analyze the data you import.

For this exercise, you'll work with the famous [Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which contains measurements of iris flowers, including sepal length, sepal width, petal length, and petal width. The dataset has 150 samples and will allow you to practice reading and analyzing CSV files. 

> **Questions**
>
> 1. Load the file `data/iris.csv`, excluding the last column (which contains the flower variety), and store the result in a variable called `iris_data`. Display the shape of the array.
>
> 2. Explore each column of the `iris_data` array. Calculate and display the following statistics for each column: data type, minimum, maximum, mean, standard deviation, and the count of NaN (Not a Number) values.

### 3.5. Matrices

In this section, you'll work with matrices, which are two-dimensional arrays of numbers. You will learn how to create different types of matrices using `numpy` and perform basic matrix operations. These exercises will give you hands-on experience with the creation and manipulation of matrices in Python.

> **Questions**
>
> 1. Create a 4 x 4 identity matrix.
> 
> 2. Create a 10 x 10 matrix filled with ones.
> 
> 3. Create a 6 x 6 matrix of random floats, with values between 0 and 1.
> 
> 4. Create a 6 x 1 array of random integers between 0 and 6.
> 
> 5. Find the transpose of the matrix created in question 4.
> 
> 6. Multiply the matrix from question 4 by the vector created in question 5.

## 4. Matplotlib in a Nutshell

Python offers several libraries for data visualization, with [Matplollib](https://matplotlib.org/) being the most widely used. It is a versatile library that supports static, animated, and interactive visualizations. You can use Matplollib in Python scripts, Jupyter notebooks, web applications, and graphical user interfaces. In Jupyter notebooks, Matplollib is integrated with the help of the [IPython library](https://ipython.readthedocs.io/en/stable/). By default, Matplollib displays plots as static images inline within the notebook.

If you'd like to display interactive plots within the notebook, you can use the magic command `%matplotlib widget`. This command allows for more dynamic interactivity and requires the installation of the [IPympl library](https://matplotlib.org/ipympl/), which is a backend that renders Matplollib figures interactively. The documentation for this magic command is available [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-matplotlib).

In [6]:
import matplotlib.pyplot as plt

### 4.1. Warmup

In this exercise, you will work with the sine and cosine functions. Follow these steps:

> **Questions**
>
> 1. **Create a vector** $\theta \in [0, 2\pi]$ with 200 points.
> 2. **Display the functions** $\sin(\theta)$ and $\cos(\theta)$ on the same plot.
> 3. **Add a title** to the plot as "Famous trigonometric functions." Also, **add axes labels** and a **legend** to distinguish between the two functions.
> 4. **Add a grid** to the plot for better visibility.
> 5. **Customize the appearance** of the functions by changing the color and linestyle. The sine function should appear in red with a dotted linestyle, and the cosine function should be plotted in green with a dash-dot linestyle.

### 4.2. Data and Fit

In this exercise, you will create a dataset and fit a function to it. Follow these steps:

> **Questions**
>
> 1. Create a vector $x \in [0, 50]$ with 100 points.
> 2. Create a vector $y = ax + b + n$, where $a$ and $b$ are two scalars of your choice and $n$ is a random vector of noise of the same size as $x$, with a variance and a mean of your choice.
> 3. Make a scatter plot of $y=f(x)$.
> 4. Plot the fit of the function in red (`np.polyfit` and `np.poly1d`).
> 
> **Answers**
>   
> Please see the code cells below for the answers to the questions.

## 5. Pandas in a Nutshell

The library [Pandas](https://pandas.pydata.org/) is one of the most important library for data analysis in Python. It provides a lot of functions to manipulate data stored in objects called `DataFrame`. We will see here the basics of Pandas and how to manipulate data.

### 5.1. Warmup

In this exercise, you will work with the `iris` dataset. Follow these steps:

> **Questions**
>
> 1. Read the file `data/iris.csv` and store it in a variable called `iris_data` with the function `read_csv` of the library `pandas`.
> 2. Display the first 5 rows of the dataset.
> 3. Display the last 7 rows of the dataset.
> 4. Display the shape of the dataset.
> 5. Display basic statistics of the dataset with a single method.
>
> **Answers**
>
> Please see the code cells below for the answers to the questions.

In [7]:
import pandas as pd

### 5.2. Data manipulation and display

In this exercise, you will work with the `iris` dataset. Follow these steps:

> **Questions**
>
> 1. Display the mean of the sepal length of the dataset (all varieties at once).
> 1. Display the standard deviation of the sepal length of the dataset (all varieties at once).
> 1. Plot the histogram of the sepal length for the variety "Setosa".
> 1. Scatter plot the sepal length versus the sepal width, and color the points according to the variety of the flower.