# Anaconda Vs Miniconda

Here is a table that summarizes the main differences between Anaconda and Miniconda:

| **Feature** | **Anaconda** | **Miniconda** |
|-------------|--------------|----------------|
| **Installation Size** | Large (Several GB, depending on packages) | Small (Less than 100MB) |
| **Content** | - Conda + <br> - Over 250 popular scientific and data processing packages by default | - Conda (package and environment manager) |
| **Default Packages** | Contains a large number of popular scientific packages (such as NumPy, Pandas, Jupyter, etc.) | No additional packages are installed by default |
| **Usage** | Ready-to-use for most scientific and data analysis tasks | You manually install the packages you need |
| **Size after Installation** | Typically larger due to the large number of packages installed | The size depends on the packages you choose to install later |
| **Ideal for** | New users, education, workshops, and when you want a set of scientific and data processing packages ready-to-use | Advanced users, lightweight installations, servers, containers, or when space is a concern |

* **Anaconda** :
A full-fledged data science platform that comes with over 250 popular scientific and data processing packages by default. It is ideal for new users or those who want to have a set of scientific and data processing packages ready-to-use. 

* **Miniconda** :
A lightweight version of Anaconda that comes with only the Conda package and environment manager. Miniconda is designed for advanced users who want more control over their Python environment and packages.

# Setup an environnement

## Step 1:
Open a terminal.

## Step 2:
List the available environments:

```bash
conda env list
```

## Step 3:
Activate an environnement:

```bash
# to create environement
conda create --name env_name python=3.8

# to activate environement
conda activate env_name
```

## Step 5 (OPTIONAL):
List packages:

```bash
conda list
```

## Install and launch jupyter notebook and de jupyter lab

```bash
# install jupyter
conda install jupyter

# install jupyterlab
conda install jupyterlab

# launch jupyter notebook
jupyter notebook

# launch de jupyter lab
jupyter lab
```

Find more conda commands here : https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf

## Kernel

### What is a kernel?

A kernel is like a "brain" for your Jupyter notebook. It's the engine that actually runs your code and returns the results. Think of it as an interpreter that understands and executes the programming language you're using.

Key points to understand:

1. **Purpose**: A kernel allows Jupyter to execute code in a specific programming language (like Python, R, or Julia).

2. **Connection to environments**: While a kernel is associated with a specific environment (like a Conda environment), it's not stored inside that environment. Instead, it's registered with Jupyter in a separate location.

3. **Flexibility**: You can have multiple kernels for the same language, each connected to a different environment. This allows you to switch between different sets of installed packages without changing notebooks.

### Kernel vs. Environment

- An **environment** is like a isolated room containing a specific set of tools (packages and a Python version).
- A **kernel** is like a messenger that knows how to use the tools in a specific room and communicate the results back to Jupyter.

### The ipykernel Package

The `ipykernel` package is what creates this "messenger" for Python environments. When you install it:

1. It sets up the necessary components for Jupyter to communicate with your Python environment.
2. It registers this new kernel with Jupyter, making it available in your notebook's kernel selection menu.

Let's visualize :

```txt
╔════════════════════════════════════════════════════════════════════════════════════════╗
║                                  SYSTEM OVERVIEW                                        ║
╠════════════════════════╦═══════════════════════════════╦═══════════════════════════════╣
║    USER INTERFACE      ║           KERNEL              ║     SYSTEM RESOURCES          ║
║                        ║                               ║                               ║
║  ┌──────────────────┐  ║   ┌─────────────────────┐    ║    ┌────────────────────┐    ║
║  │    YOUR CODE     │  ║   │    CODE PARSER      │    ║    │        CPU         │    ║
║  │ Python/Notebook  │──╬──>│                     │    ║    │                    │    ║
║  └──────────────────┘  ║   └─────────┬───────────┘    ║    └─────────▲──────────┘    ║
║                        ║             │                 ║            │              ║
║                        ║             │                 ║            │              ║
║                        ║             ▼                 ║            │              ║
║                        ║   ┌─────────────────────┐    ║            │              ║
║                        ║   │   CODE EXECUTOR     │────╬────────────┘              ║
║                        ║   │                     │    ║                           ║
║  ┌──────────────────┐  ║   └─────────┬───┬───────┘    ║    ┌────────────────────┐    ║
║  │     OUTPUT       │  ║             │   │            ║    │       RAM          │    ║
║  │  Text/Graphs     │  ║             │   │            ║    │     (Memory)       │    ║
║  │   Results        │◀─╬─────────────┘   │            ║    └────────────────────┘    ║
║  └──────────────────┘  ║                 │            ║                               ║
║                        ║   ┌─────────┐   ┌─────────┐  ║    ┌────────────────────┐    ║
║                        ║   │  MEM    │   │  I/O    │  ║    │   FILE SYSTEM      │    ║
║                        ║   │  MGR    │   │ HANDLER │──╬───>│                    │    ║
║                        ║   └─────────┘   └─────────┘  ║    └────────────────────┘    ║
║                        ║                      │       ║                               ║
║                        ║                      │       ║    ┌────────────────────┐    ║
║                        ║                      └───────╬───>│      NETWORK       │    ║
║                        ║                              ║    │                    │    ║
╠════════════════════════╬═══════════════════════════════╬═══════════════════════════════╣
║                                       DATA FLOW                                         ║
╠════════════════════════╦═══════════════════════════════╦═══════════════════════════════╣
║ 1. Code Input          ║ 2. Parse & Execute            ║ 4. Resource Usage             ║
║ 5. Results Output      ║ 3. Memory & I/O Management    ║    (CPU/RAM/Files/Network)   ║
╚════════════════════════╩═══════════════════════════════╩═══════════════════════════════╝
```

Let's explain the flow shown in the diagram:

1. When you run code, it first goes to the kernel's Parser
2. The Parser converts your code into a form the kernel can execute
3. The Executor then coordinates with different components:
   - Memory Manager: Handles RAM allocation and garbage collection
   - I/O Handler: Manages file operations, network requests, etc.

4. The kernel interacts with system resources:
   - CPU: For computation
   - RAM: For storing variables and data
   - File System: For reading/writing files
   - Network: For internet access, API calls, etc.

5. Finally, results are sent back as output (text, graphs, errors, etc.)

This shows why the kernel is so important - it's the bridge between your code and the computer's resources, handling all the complex coordination needed to execute your code safely and efficiently.

### Why Create Different Kernels?

Creating different kernels allows you to:

1. Use different versions of Python or different sets of packages in different notebooks.
2. Keep your projects isolated, preventing package conflicts.
3. Share notebooks that others can run with the exact same environment you used.

### In Practice

When you create a new Conda environment and want to use it in Jupyter:

1. Activate the environment
2. Install ipykernel
3. Register the new kernel with Jupyter

This process creates a bridge between your isolated Conda environment and Jupyter, allowing you to use that environment's specific tools and packages in your notebooks.


### How to create a kernel?

With the following command, you can create a kernel within an environment:

```bash
# install ipykernel
conda install -c anaconda ipykernel

# create a kernel within an environment
python -m ipykernel install --user --name env_name
```

Let's visualize :



## Why create different kernels?

Creating different kernels is useful for several reasons:

- **Flexibility**: You can use different programming languages and Python versions in your Jupyter notebooks, depending on the kernel you choose. For example, you can have a Python 3.8 kernel for one notebook and a Python 3.9 kernel for another notebook.

- **Isolation**: You can install packages specific to each kernel, without affecting the other kernels or the base environment. For example, you can have a kernel with scikit-learn 1.3.2 and another kernel with scikit-learn 1.2.1.

- **Reproducibility**: You can share your Jupyter notebooks with other users, by indicating the kernel associated with each notebook. This way, they will be able to run the code in the same conditions as you.

**In a virtual environment, each kernel is associated with a specific Python environment**. When you create a new kernel, you can specify the Python environment you want to use for that kernel. By default, the kernel is created in the base environment of your Python installation.

When you run code in a Jupyter notebook, the kernel associated with that notebook is used to execute the code. If you have multiple kernels in the same virtual environment, you can choose the kernel to use for each notebook.

To manage the kernels in a virtual environment, you can use the `conda` command followed by the `install` option to install a new kernel, or the `remove` option to delete an existing kernel. You can also use the `jupyter kernelspec list` command to display a list of the kernels installed in your virtual environment.

> **Simply said : a notebook is like the client application that interacts with the kernel responsible for executing the code in the notebook.**

# How to install Python packages from within a Jupyter Notebook

In the notebook, you can use the `!pip install` command to install a package. We can also use the `!conda install` command to install a package in a conda environment.

## Differences between pip and conda

It is also possible to use `pip` commands to list and install packages. `pip` and `conda` are both package managers for Python, but they have major differences in their design, functionality, and usage. Here is a comparative table to highlight some of these differences:

Here is a table that compares `pip` and `conda`:

| Feature | pip | conda |
|---------|-----|-------|
| **Nature** | Package manager for Python. | Package manager and environment management system, mainly for Python but also for other languages. |
| **Package sources** | PyPI (Python Package Index). | Conda channels, with `conda-forge` being a popular channel. |
| **Package format** | Wheels (`*.whl`) or sdist. | Conda binaries (`*.tar.bz2`). |
| **Dependencies** | Manages Python dependencies. | Manages dependencies at both the language and system level. For example, `conda` can install non-Python packages like `gcc` or system libraries. |
| **Virtual environments** | Must be used with `virtualenv` or `venv` to manage virtual environments. | Integrates its own environment management system. |
| **Platforms** | Works primarily on Python. | Works with Python and other languages and platforms. |
| **Installation of non-Python packages** | Limited, as it is specific to Python. | Can install non-Python packages, such as command-line tools or libraries written in C, C++, etc. |
| **Integration with systems** | Less integrated with systems like Anaconda or Miniconda. | Is the package manager and environment management system for Anaconda and Miniconda. |
| **Provenance** | Open source project of the Software Freedom Conservancy. | Open source project developed by Anaconda, Inc. |

Here is the translation of the selected text in English:

In summary:
- `pip` is a package manager specifically for Python and is ideal for libraries and frameworks that are purely Python.
- `conda` is a more general package and environment manager that handles not only Python packages, but also system-level dependencies, making it ideal for projects that have complex dependencies or require non-Python libraries.


In [7]:
# to install pandas
!pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m390.8 kB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
[?25hDownloading numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.0/16.0 MB[0m [31m251.5 kB/s[0m eta [36m0:00:00[0m00:01[0m00:03[0m
[?25hDownloading tzdata-2024.2-py2.py3-none-any.whl (346 kB)
Installing collected packages: tzdata, numpy, pandas
Successfully installed numpy-2.1.2 pandas-2.2.3

In [8]:
# install sklearn
!pip install scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting scipy>=1.6.0 (from scikit-learn)
  Downloading scipy-1.14.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Downloading threadpoolctl-3.5.0-py3-none-any.whl.metadata (13 kB)
Downloading scikit_learn-1.5.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.9/12.9 MB[0m [31m459.5 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading joblib-1.4.2-py3-none-any.whl (301 kB)
Downloading scipy-1.14.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m271.2 kB/s[0m eta 

In [6]:
# import pandas
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [4]:
# load a dataset from sklearn
from sklearn.datasets import load_iris

# load dataset
iris = load_iris()

# print the names of the features
print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [5]:
# convert iris dataset to pandas dataframe
df = pd.DataFrame(iris.data, columns=iris.feature_names)
ith syste
# print the first 5 rows of the dataframe
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [1]:
# list packages installed
!conda list

# packages in environment at /opt/conda:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anaconda-anon-usage       0.4.4           py312hfc0e8ea_100  
anyio                     4.6.0              pyhd8ed1ab_1    conda-forge
archspec                  0.2.3              pyhd3eb1b0_0  
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py312h66e93f0_5    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4              pyhd8ed1ab_0    conda-forge
attrs                     24.2.0             pyh71513ae_0    conda-forge
babel                     2.14.0             pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.12.3      

In [7]:
# list packages with pip
!pip list

Package               Version
--------------------- ------------
aiofiles              22.1.0
aiosqlite             0.18.0
anyio                 3.5.0
argon2-cffi           21.3.0
argon2-cffi-bindings  21.2.0
asttokens             2.0.5
attrs                 23.1.0
Babel                 2.11.0
backcall              0.2.0
beautifulsoup4        4.12.2
bleach                4.1.0
Brotli                1.0.9
certifi               2023.7.22
cffi                  1.15.1
charset-normalizer    2.0.4
colorama              0.4.6
comm                  0.1.2
contourpy             1.2.0
cryptography          41.0.3
cycler                0.12.1
debugpy               1.6.7
decorator             5.1.1
defusedxml            0.7.1
entrypoints           0.4
executing             0.8.3
fastjsonschema        2.16.2
fonttools             4.47.0
idna                  3.4
ipykernel             6.25.0
ipython               8.15.0
ipython-genutils      0.2.0
ipywidgets            8.0.4
jedi                  0.1

In [25]:
# create requirements.txt and list the packages in it using conda
!conda list > requirements.txt

## Installing packages from requirements.txt

```shell
# Install with conda
!conda install -c conda-forge --file requirements.txt
```

## organizing folder to start a project from scratch

It's highly recommended to use a clear folder structure to organize your project. For example, you can create a folder called `myproject` and then create folders `notebook`, `src` and `data`.

```python
# Create a folder called `myproject`
mkdir myproject

# change directory to `myproject`
cd myproject

# Create folders notebook, src and data
mkdir notebook src data

# function to list all folders and create ___init__.py in each folders
def create_init_files(root_dir):
    """Create __init__.py in each folder"""
    for dirpath, dirnames, filenames in os.walk(root_dir):
        for dirname in dirnames:
            init_file = os.path.join(dirpath, dirname, '__init__.py')
            if not os.path.exists(init_file):
                with open(init_file, 'w') as f:
                    f.write('')
```

In Python, a package is simply a way of organizing related modules in a unique directory hierarchy. The `__init__.py` file serves as an initializer for the package, and its presence indicates to Python that a directory should be treated as a package or a subpackage.

When a package is imported, the `__init__.py` file is implicitly executed, and the objects it defines are bound to names in the package namespace ¹. The `__init__.py` file can contain the same Python code as any other module and Python will add some additional attributes to the module when it is imported.

The `__init__.py` file is also used to create nested packages. For example, if you have a directory `foo` that contains a `__init__.py` file, you can create a subdirectory `bar` that also contains a `__init__.py` file. This will create a nested package called `foo.bar`.

In [4]:
!ls

Intro_conda_jupyterNB_EN.ipynb	data			     requirements.txt
Intro_conda_jupyterNB_FR.ipynb	data_enginering_machine.pem  setup.py
README.md			docker-compose.yml	     src
connect_to_vm.md		notebooks


In [2]:
# folders creation
!mkdir notebooks src data

mkdir: cannot create directory ‘notebooks’: File exists
mkdir: cannot create directory ‘src’: File exists
mkdir: cannot create directory ‘data’: File exists


In [35]:
# function to list all folders and create __init__.py in each folder
#!find . -type d -exec touch {}/__init__.py \;
import os

def create_init_files(root_dir):
    """Create __init__.py in each folder"""
    for dirpath, dirnames, filenames in os.walk(root_dir):
        for dirname in dirnames:
            init_file = os.path.join(dirpath, dirname, '__init__.py')
            if not os.path.exists(init_file):
                with open(init_file, 'w') as f:
                    f.write('')

create_init_files('.')
print("done!")

done!


Next you can create a file called `test_py.py` in the `src` folder.
```python
## test_py.py
# section des imports
import sys

# create a function that prints hello world
def hello():
    print("Hello World!")
    
if __name__ == "__main__":
    hello()
```

To run the file you can simply run the following command :

In [5]:
# import d'une fonction du dossier src
from src.test_py import hello

hello()

Hello World!


## delete a virtual environment

```python
# delete the virtual environment (linux)
!rm -rf myproject

# delete the virtual environment (windows)
!conda env remove -n myproject
```

## Working with Github
You first need to ctreate a repo on Github. Next you need to `clone` the repo on your local machine to work with it.

The other option is to start working localy and then link your local folder to your repo.

```bash
# create README.md
echo "# test" >> README.md

# initialize the repo
git init

# add README.md to the git index
git add README.md

# commit changes (save the changes)
git commit -m "first commit"

# create a new branch (like a version of you work = the whole code you created)
git branch -M main

# add remote
git remote add origin https://github.com/ssime-git/test.git

# push the code
git push -u origin main
```

Here is the translation in English:

## Creating a Github repo from a command prompt (for advanced uses)

In general, you can create github repos from the internet browser. This is the simplest method. It is possible to create a GitHub repository from a command prompt using the `hub` tool. `hub` is a Git command-line extension that adds GitHub features. You can use the `hub create` command to create a new repository on GitHub. Here are the steps to follow:

1. Install `hub` by following the instructions on the official `hub` website.

2. Open a command prompt and go to the local directory that you want to push to GitHub.

3. Run the `git init` command to initialize a new Git repository.

4. Run the `hub create` command to create a new repository on GitHub.

5. Follow the prompts to configure the repository, including the name, description and visibility.

6. Run the `git push -u origin master` command to push your local code to the remote repository.

This will create a new repository on GitHub from the command prompt and push your local code to the remote repository.

For more details see [hub's website](https://hub.github.com/).

# Conclusion

You are ready to start your projects with Github, Jupyter Notebook, Vs Code and Python!

# A little something to go even further

You can use jupyer lab with docker without having to install Anaconda or Miniconda locally. In this cas you would simply need to have Docker installed on your machine and use the following command: `docker-compose up`.