# 1. Qu'est-ce qu'un Notebook Jupyter ?

- is a file, with extension `.ipynb`
- contains 
  1. text (to be read) 
  2. code (to be run, or executed);
- the text is written in the Markdown markup language and the code in the Python programming language



## Start Jupyter Notebook through Binder

- Go to our Github repo at https://github.com/enury/2024-01-18-collation-intro
- Click the button : ![image.png](https://mybinder.org/static/images/badge_logo.svg?v=117793ab76524046ef44e2d2d5af220c)

Cela peut prendre un petit moment pour se mettre en route.

![image.png](images/jupyter-lab-homepage.png)

Vous voyez quelque chose comme ça ? Hourra !

## Page d'accueil

Le panneau latéral gauche montre un gestionnaire de fichier, où vous pouvez naviguer normalement dans les dossiers. 

- **Data**: les données à utiliser pour la collation, 
- **Images** : les images qui illustrent les notebooks, comme celle juste au-dessus
- **Notebooks** = les fichiers avec extension *.ipynb* notebooks

Faites un double-clic sur `Partie1-Jupyter-python-intro.ipynb` pour ouvrir le notebook.

## Basic operations in Jupyter Notebook

First, a Jupyter Notenook is made of **cells**, just like a prose text is made of paragraphs. The **borders** of a cell appear when you click on it. Try clicking anywhere in the file and you'll see them!

A cell can contain:

1. **Text in Markdown**: this is the styled text mode that we use when we’re not writing code, to add documentation and instructions.
2. **Code in Python**.

Voyons un peu comment cela fonctionne...

### Les actions concernant les cellules

![image.png](images/jupyter-barre1.png)

1. Ajouter une nouvelle cellule
2. Afficher le texte / exécuter le code (CTRL+Enter)
3. Changer le type de cellule

### Cellules de texte

Cette cellule contient du texte ! Essayez de double-cliquez dessus pour voir apparaître le formattage:

- Trois hashtags indiquent un titre de niveau trois 
- Everything between une asterisks is *italic*
- Everything between deux asterisks is **bold**
- les tirets créent une liste

On peut modifier le texte, par exemple changez le niveau du titre en enlenvant ou ajoutant des hashtags.

Maintenant cliquez sur le bouton ![run](images/jupyter-bouton1.png) pour afficher le texte.

### Markdown

Cette syntaxe s'appelle markdown. 

Voici une *cheatsheet* [ici](https://www.markdownguide.org/cheat-sheet) et un tutoriel [ici](https://www.markdowntutorial.com/) (Attention: la traduction française peut contenir des erreurs).

### Cellules de Code

Comme pour le texte, il faut double-cliquer pour "entrer" dans la cellule et la modifier.

Puis ![run](images/jupyter-bouton1.png) pour exécuter le code !

In [11]:
print("Bonjour à tous !")

Bonjour à tous !


#### Input and output sections in code cells

The sequence above shows a key aspect of Jupyter code cells: there’s an **input section** and an **output section**: the input section is where you write the code and is indicated on the left of the editing box with the word “In” followed by a number in square brackets; the output section is where the result of running your code will appear, below the input section.

## *A votre tour !*

### Cellule de texte

1. Sélectionnez la cellule *Insérez du texte* en cliquant dessus, et appuyer sur le bouton **+** dans la barre des tâches.
2. une fois dans votre cellule, changez son type en sélectionnant _Markdown_ au lieu de _Code_. Le `in [ ]` à gauche va disparaître.
3. Copiez le texte suivant dans la nouvelle cellule et affichez le résultat avec le bouton ![run](images/jupyter-bouton1.png).


`# Bienvenue !`

`Voici mon **premier** notebook *Jupyter*.`



### *Insérez du texte*

### Cellule de Code

1. Cliquez sur la cellule *Insérez du code* en cliquant dessus, et appuyez sur le bouton **+** dans la barre des tâches.
2. Ne pas changer le type de cellule !
3. Copiez et exécutez le code ci-dessous

`print("Hello World!")`

Bravo, vous avez créé votre premier programme !

### *Insérez du code*

## Save

If we have been working on a notebook, we would of course want to save our work before quitting. For saving the notebook you are working on, you have again several options:
- press the file icon, that is the first button on the left
- (or, alternative) select *Save and checkpoint* from the File menu
- (or, alternative) hit Ctrl+s


#### ... but if we are in the virtual machine created by Binder 

nothing will be saved to our computer.

Luckily, we have another option:


## Download and upload

We can download the notebook in different formats (menu `File > Download as`). We suggest to always save a copy in the .ipynb format, in addition to other formats you might want to use: this way, you will be able to work again on the notebook and export its content in new formats if needed.

To reopen a notebook that you have download to your computer, you can use the Upload function (see below).

## Create a new notebook

- Go back to the tab in your browser with the file manager (see the image above) and navigate to the folder in which you want to open a new file.
- Select the type of file you want to create from the 'New' dropdown menu on the right: for a Jupyter Notebook, select 'Python 3'.
- A new tab with the new file will open. Give it a name by double-clicking on the name 'Untitled' and typing your new name.
- That's it!

## More about Jupyter notebooks

Resources about Jupyter Notebook are available through

- the official [Jupyter](https://jupyter.org/) page
- the Jupyter Notebook [documentation](https://jupyter-notebook.readthedocs.io/en/stable/index.html)
- many other places on the web

For a DH oriented introduction, have a look at the first chapters (*Getting setup* and *Getting started*) of [The Art of Literary Text Analysis](http://nbviewer.jupyter.org/github/sgsinclair/alta/blob/master/ipynb/ArtOfLiteraryTextAnalysis.ipynb) by Stéfan Sinclair & Geoffrey Rockwell.

# 2. Introduction à Python

## Summary

- data types
- variables
- functions
- open and read files
- import packages


## Data types

Data types tell the computer what a piece of data is, and what it can do with it.

You do not do the same things with words, or with numbers...

In [None]:
# compare
"20"+"20"

In [None]:
20+20

**String (str)**:
A sequence of characters (letters, punctuation, whitespace...)

**Integer (int) and Float (float)**:  
correspond to whole numbers (42) and decimal numbers (42.5)

**Boolean (bool)**: 
either `True` or `False`

In [6]:
# find the type of something
type("20")

str

## Variables
Variables are used to give names to a piece of data (number, characters, strings, etc.)


In [None]:
# Here is how you assign a value to a variable
# variable_name = variable_value
a = 3.14159
b = "hello world!" # you can also use single quotes: 'hello'
c = b



- Variables are used to store information to be accessed and manipulated in a computer program.
- You can think of it as a **label** pointing at something stored in memory.
- The value of a variable can be changed.
- You need to assign a value to a variable before you can use it.

In [None]:
# 1. assign a value to variable x
x = 2+3

# 2. you can reuse and modify your variable
y = x*2
x = "+++ Divide By Cucumber Error. Please Reinstall Universe And Reboot +++"

# check the value of x and y
print(x)
print(y)

## Variables - Names

- The name of a variable can **contain** letters, numbers and underscore characters `_`
- The name of a variable **starts** with a letter or `_`
- It is case sensitive (`Test`and `test` are different variables)
- Use meaningful variable names (not `a` or `b`, but better `wordCount` or `word_count`)

## Errors: Don't Panic!

In [None]:
a = "The answer is "
b = 42
a+b

<img src="images/python-error-1.png"  style="display:block;margin-right:auto;margin-left:auto;">

Python gives you the information that you need to correct the error:
1. The sort of error that was made: there is a problem with a data type.
2. The error message: something should be a string, but instead Python got an integer.
3. The line of code where the error happened: it highlighted with an arrow.

**Conclusion**: the variable `b` should be converted to a string, if we want to eliminate the error at line 3 `a+b` .

## Functions
This is how you give orders to the computer!

In [4]:
# assign your name to the variable
name = ""

# tell the computer to say hi
print("Salut "+name+"!")

Salut !


## Functions - Examples

- general: `print()`, `help()`
- data types: `type()`, `int()`, `str()`
- counting the length: `len()`

What goes inside the parenthesis is called the `argument`. Arguments are information passed to the function so that it can do its job! Arguments can be optional or mandatory.

Functions usually send you back some information that can be stored in a variable, for later use:

In [None]:
a = "a short string"
b = len(a)
print(b)

## Methods
Methods are functions that apply only to a certain data type:
`str.upper()`, `str.find(sub[, start[, end]])`

These functions apply to the data type *string* (str). What you see in the parenthesis are the arguments: `str.upper` has no argument, but `str.find` needs at least one argument (*sub*) and has two optional arguments (*start* and *end*)

Here is a list of methods available for strings: <https://docs.python.org/3/library/stdtypes.html#string-methods>

## Examples - Working with Strings

In [None]:
message = "Le petit chien est sur la pente fatale!!!"

# replace characters
a = message.replace("petit chien", "grand chat")

# count occurrences of a substring
b = message.count("!")

# split a string at whitespace characters
c = message.split(" ")

# join strings together
d = "-".join(c)

# print a variable to see the result

## Functions - Exercises

The lines starting with # are comments, they are ignored by Python. The comments give you the instruction of the exercise.

Complete each cell by adding code below the comment, and then run then cell (click on 'Run' or 'Exécuter').

In [17]:
# create a variable that is a string


In [18]:
# create a new variable that transforms the first one into a number


In [20]:
# find the length of both variables - what happens? Why?


TypeError: object of type 'int' has no len()

In [None]:
# ask the computer for help about the second variable!


### Fonctions - résumé
- `print()`: displays a variable
- `help()`: shows help about Python. If you give a variable name or a function name, you will receive more information about the variable data type, or how you can use the function (e.g. `help(a)` or `help(len)`.

- `type()`: gives you the type of a variable
- `int()`: converts a variable into an integer
- `str()`: converts a variable into a string

- `len()`: gives you the length of a variable (for a string, the number of characters it contains)

## Pour aller plus loin...

Pour bien comprendre le fonctionnement de *CollateX*, il peut être utile de connaîtres quelques autres types de données:

1. listes
2. dictionnaires
3. objets

###  Liste
The items or elements of lists are **ordered** in a defined sequence. A string is a sort of list.
The elements of a list can be accessed via a number that indicates their position inside the list (the **index**).

In [None]:
cities = ["Vienna", "London", "Paris", "Berlin", "Zurich"] # square brackets
world = "world"

# the index starts at 0
print(world[0])
print(cities[1])

In [None]:
# add item
cities.append("Lausanne")

# remove item
cities.remove("Zurich")

cities

In [None]:
# you can have a list of lists
lists = [[1, 2, 3],
        ["Rincewind", "Ridcully", "Hex"],
        [42, "don't panic!"]]

### Dictionnaire

A dictionary is an **unordered** collection of pairs of key/value.

The **key** is always a string.

The **value** can be anything!


In [None]:
# a dictionary is in curly brackets
book = {"title": "Good Omens",
        "author": ["Terry Pratchett", "Neil Gaiman"],
        "year": 1990
       }
# you can access a value thanks to the key
book["author"]

In [None]:
# add a key/value pair
book["publisher"] = "Gollancz"

# remove a key/value pair
book.pop("year")

book

### Objets

Python is an Object Oriented Programming language. Almost everything in Python is an object, with its properties and methods, e.g. strings, integers, lists, etc 

We have seen that **1 variable = 1 data type** (more or less, there are some more complex data types like lists and dictionaries).

But sometimes a single data type cannot describe something properly. **Classes** let you create your own objects and write methods for your objects.

For example, you can imagine a Book oject that has 3 or more properties: title, author, date...

Imagine that you have a lot of books. Now you can do different things:

- order them alphabetically by titles
- ask how many books were written by author X
- ask which book was published first


In [None]:
# example of a Book class definition

class Book:
    # when Book() is called, it creates a Book object (an "instance")
    def __init__(self, t, a, y):
        self.title = t # set title
        self.author = a # set author
        self.year = y # set year
    
# the Book() function needs three arguments, in the proper order: 
#1. title, 2. author, 3. date of publication
b = Book("Alice's adventures in Wonderland", "Lewis Carroll", 1865)

# check the author of the book you just created
b.author

**Why does this matter?**  
We are not going to create objects ourselves, but it can be useful to understand this concept because that is how the CollateX module is organized.

**Exercice:**  
Can you imagine what properties should a "Collation" class have? What other classes would we need?

## Additional materials

- Python course: [data types and variables](https://www.python-course.eu/python3_variables.php)
- The Python Tutorial: [read and write files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files)
- Real Python: [understanding error messages](https://realpython.com/python-traceback/#what-are-some-common-tracebacks-in-python)
- W3Schools Python tutorial: [objects and classes](https://www.w3schools.com/python/python_classes.asp)