# <b>Welcom to Jupyter Notebook</b>

Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages. Nowadays Jupyter supports over 40 programming languages, including Python, R, Julia, and Scala. 
</br> Today we will have a little tutorial using Python.

<b>Python</b> is one of the most popular high-level programming languages. It was created by Guido van Rossum, and released in 1991. </br> The language is named after the Monty Python comedy troupe :)

<b>Python is commonly used for: </b>
- web development
- software development
- system scripting
- data analysis (including machine learning) and data visualization 

<b>Advantages of Python:</b>
- works on different platforms (Windows, Mac, Linux, etc.)
- has a simple syntax similar to the English language
- runs on an interpreter system, meaning that code can be executed as soon as it is written
- can be treated in a procedural way, an object-oriented way or a functional way
- has a lot of pre-existing modules (a.k.a. libraries) that contain already defined functions for different tasks

The <b>Jupyter Notebook</b> is a web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. </br> <b>Jupyter Notebooks</b> are also referred as a document format based on JSON (JavaScript Object Notation). They contain a complete record of the user's sessions.

## Let's take a look at the interface

<img src="https://raw.githubusercontent.com/mavstrikova/Notebook_TP/main/Slide1.jpg">

To execute the cell you can also use a shortcut <code>Shift + Enter</code> </br> Try to run the following cell:

In [None]:
print('Hello world!')

Now try to add below your own cell and introduce yourself (e.g. print "Hello! My name is ...")

# <b>Data analysis with Jupyter notebook</b>

Now let's try to repeat your previous analysis task using Python

## <b>Step 0. Load libraries</b>

Python has a lot of libraries that contain already pre-coded functions that are convenient to use. These libraries can be external (published online) or you can create your own collection of functions. </br> <i>Little tip: before coding your own function do not hesitate to Google if there is a library relevant for your task</i>

To apply functions from the library in your code it is needed to import them, to do so run the following cell

In [None]:
import os #library to interact with LINUX
import pandas as pd #liprary to import and analyze data
import numpy as np #library for scientific computing
import matplotlib.pyplot as plt #plotting tools

You can find more details on this libraries online:
- os          https://docs.python.org/3/library/os.html
- pandas      https://pandas.pydata.org/pandas-docs/stable/index.html
- numpy       https://numpy.org/doc/stable/index.html
- matplotlib  https://matplotlib.org/stable/index.html

## <b> Step 1. Produce & import data </b>

First of all, we need to identify the list of directories to iterate through. To do so we will use the function <code>os.listdir()</code>. This function will create a list containing the content of the current folder.

In [None]:
lst = os.listdir('./')
print(lst)

</br>But how can we select only the directories we are interested in?</br>

In [None]:
runs = [i for i in os.listdir('./') if i[0:3]=='???'] #Get names of run folders
runs.sort() #Sort by name
print(runs)

</br>Now let's produce dd.out files for each run:

In [None]:
#Iterate through directories and execute plumed command line
for i in runs:
    cmd = 'cd ./'+i+'; plumed driver --plumed ../anal/plumed.dat --mf_xtc ./noPBC.xtc > /dev/null; cd ../' #Generate command that we want to run in terminal
    print(cmd) #print the command
    os.system(cmd) #exicute the command

</br>To read files we will use <code>pandas.read_csv()</code> function. Like this, we create the data frame containing the data.

In [None]:
pd.read_csv('./run01/dd.out', sep ='\s+', skiprows=[0], names = ['Time', 'Distance'])

</br>Now let's read all the files and put them together in dictionary.

In [None]:
data = {} #Create empty dictionary
#Iteratively read tables and write down into the dictionary
for i in runs:
    data[i] = pd.read_csv('./'+i+'/dd.out', sep ='\s+', skiprows=[0], names = ['Time', 'Distance'])

In [None]:
print(data)

## <b>Step 2. Plot data</b>

To plot our data we will use <code>matplotlib.pyplot.plot()</code> function. </br> Let's first identify what will be the X and Y axis. <i>Hint: refer to column names in your table</i>

In [None]:
X = 'WHAT WILL BE YOUR X AXIS'
Y = 'WHAT WILL BE YOUR Y AXIS'

In [None]:
plt.plot(data['run01'][X],data['run01'][Y]) #Plot for run01

Now let's try to plot everything together.

In [None]:
fig, ax = plt.subplots(figsize = [15, 8]) #Create figure object and define size
#Iteratively plot data for each run
for i in runs:
    plt.plot(data[i][X], data[i][Y], label = i)
#Print axis labels
plt.xlabel(X)
plt.ylabel(Y)
#print plot legend
plt.legend() 

</br>Looks a bit messy, isn't it? </br> Try to select data for your plot so that you will represent two distinct states and one trajectory where transition occures.

In [None]:
fig, ax = plt.subplots(figsize = [15, 8])
plt.plot(data['run??'][X], data['run??'][Y], label = 'run??')
plt.plot(data['run??'][X], data['run??'][Y], label = 'run??')
plt.plot(data['run??'][X], data['run??'][Y], label = 'run??')
#Print axis labels
plt.xlabel(X)
plt.ylabel(Y)
#print plot legend
plt.legend() 
#save figure
plt.savefig('my_beautiful_plot.png')

## <b>Step 3.Compute average value </b>

We will use <code>numpy.mean()</code> function to compute average distance.

In [None]:
np.mean(data['run01']['Distance'])

To store average distances for all runs let's put them in a dictionary. </br> <i>Hint: We already created dictionary at STEP 1 try to do it yourself now </i>

In [None]:
averages = {}
for i in runs:
    ## PUT YOUR CODE BELOW ##

In [None]:
print(averages)

</br> Now let's put it in the data frame and save as .csv file

In [None]:
pd.DataFrame.from_dict(averages, orient='index', columns=['Average distance'])

In [None]:
avg = pd.DataFrame.from_dict(averages, orient='index', columns=['Average distance']) 
avg.to_csv('average_dist.csv') #save data frame as .csv

# Congratulations this is the end of tutorial!

P.S. If you got interested, I can recommend you some useful links:
- Python course for beginners: https://www.coursera.org/learn/python?specialization=python
- Tutorial on data analysis with pandas: https://mlcourse.ai/articles/topic1-exploratory-data-analysis-with-pandas/
- Tutorial on plotting data with matplotlib: https://matplotlib.org/stable/tutorials/introductory/pyplot.html
- More about Python application in chemistry: https://pythoninchemistry.org/

<center><i>This nice tutorial was created by</br> Mariia Avstrikova </i></center>