# Getting Started with Pandas

Before we get started, you may need to install Pandas - depending on how/where you are running Python.
This guide is intented for people running a local installation of Python and using Jupyter Notebook or Jupyter Lab.

There are many ways to install Pandas. Ours is just one. [You can read more here.](https://pandas.pydata.org/docs/getting_started/install.html)

___

## Installing (or upgrading) Pandas from PyPI

Pandas can be installed via `pip` from [PyPi](https://pypi.org/project/pandas).

This command installs the `pandas` library using `pip`, Python's package installer.\
If you have not installed Pandas before, this will download and install the latest version of Pandas along with its dependencies.

In [1]:
  pip install pandas



Note: you may need to restart the kernel to use updated packages.


`pip` commands are most often run through a shell such as the **Terminal** on macOS or the **Command Prompt** on Windows.\
However, you can run shell commands directly from Jupyter notebooks by adding an exclamation mark `!` in front of the command:

In [2]:
! pip install pandas



Adding the `--upgrade` flag not only installs pandas if it is not already installed but also ensures that if Pandas is already installed, it is updated to the latest version. This is useful to make sure you have the newest features and bug fixes.

In [3]:
pip install --upgrade pandas



Collecting pandas


  Downloading pandas-2.2.3-cp311-cp311-macosx_10_9_x86_64.whl.metadata (89 kB)




Downloading pandas-2.2.3-cp311-cp311-macosx_10_9_x86_64.whl (12.6 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/12.6 MB[0m [31m?[0m eta [36m-:--:--[0m

[2K   [91m━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/12.6 MB[0m [31m2.3 MB/s[0m eta [36m0:00:06[0m

[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/12.6 MB[0m [31m2.3 MB/s[0m eta [36m0:00:06[0m

[2K   [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/12.6 MB[0m [31m4.2 MB/s[0m eta [36m0:00:03[0m

[2K   [91m━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/12.6 MB[0m [31m4.2 MB/s[0m eta [36m0:00:03[0m

[2K   [91m━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/12.6 MB[0m [31m5.1 MB/s[0m eta [36m0:00:02[0m

[2K   [91m━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/12.6 MB[0m [31m5.3 MB/s[0m eta [36m0:00:02[0m

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━[0m [32m7.6/12.6 MB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━[0m [32m9.4/12.6 MB[0m [31m5.7 MB/s[0m eta [36m0:00:01[0m

[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m11.5/12.6 MB[0m [31m5.9 MB/s[0m eta [36m0:00:01[0m

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.6/12.6 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25h

Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 2.2.2


    Uninstalling pandas-2.2.2:


      Successfully uninstalled pandas-2.2.2


Successfully installed pandas-2.2.3


Note: you may need to restart the kernel to use updated packages.


Again, we can run this shell command directly in Jupyter notebooks:

In [4]:
! pip install --upgrade pandas





````{admonition} ModuleNotFoundError
:class: attention dropdown

When using Pandas, you may encounter the error message `ModuleNotFoundError`. This is caused by so-called Pandas dependencies.

When you install the `pandas` library for Python, it requires other libraries to function correctly. These required libraries are called dependencies.
Dependencies are external libraries that provide additional functionality or capabilities that the main library (in this case, Pandas) relies on to operate.

When you install Pandas using `pip`, Python's package installer, it automatically installs any required dependencies. However, optional dependencies are not installed by default and must be installed separately if needed.

[You can read more about Pandas dependencies here.](https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#optional-dependencies)

In our case, we will be working with Excel files. All Excel-related dependencies can be installed in one go with this command:

```
pip install "pandas[excel]"
```

Or - if run in a Jupyter notebook:

```
! pip install "pandas[excel]"
```

````

___

## Importing Pandas

Even though we have now installed Pandas, we still need to import Pandas when we want to use it in our code.\
This is conventionally done at the top of the script (together with any other imports) to make it easier for future readers (including ourselves!) to see, which packages are used.

Aliasing `pandas` as `pd` is a widely adopted convention that simplifies the syntax for accessing its functionalities.\
After this statement, you can use `pd` to access all the functionalities provided by the `pandas` library.

In [5]:
 # This line imports the pandas library and aliases it as 'pd'.

import pandas as pd


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/traitlets/config/application.py", line 1077, in launch_instance
    app.start()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ipyk

AttributeError: _ARRAY_API not found


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ipykernel_launcher.py", line 18, in <module>
    app.launch_new_instance()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/traitlets/config/application.py", line 1077, in launch_instance
    app.start()
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/ipyk

AttributeError: _ARRAY_API not found