# <font style="font-family:roboto;color:#455e6c"> Data analysis and workflows in Materials science </font>  

## <font style="font-family:roboto;color:#455e6c"> Part I: Introduction to Python </font>  

### Why do you need python for data analysis?

- **Easy to learn**: clean and simple, easy to read, inituitive. You will start writing code in about 1 minute..
- **Reproducibility**: writing automated scripts to analyse data ensures reproducibility. 
- **Versatile and extensible**: A number of useful libraries for scientific computing and data analysis such as numpy, scipy, matplotlib, pandas and modules for data science.
- However much slower than C++ and FORTRAN

### The Jupyter Notebook

* interactive editor for Python (and other languages)

#### Some useful shortcuts

* `Enter` on a cell to edit it
* `Esc` to stop editing
* `Alt+Enter` to execute cell
* `A` to insert cell above
* `B` to insert cell below

### Typing code

In [1]:
# You can put comments by putting "#" first

You can print by,

In [3]:
print("Hello, World!")

Hello, World!


Indentation is important!</br>  
This works -

In [6]:
if 5 > 2:
    print("Five is greater than two!")

Five is greater than two!


This does not-

In [4]:
if 5 > 2:
print("Five is greater than two!")

IndentationError: expected an indented block (<ipython-input-4-a314491c53bb>, line 2)

### Variables in python

Python has five standard data types- </br>

* Number - integer, float, long, complex, boolean
* string
* list
* tuple
* dictionary

Variables can have any names, except for reserved python names such as `class, def, lambda, int` etc

Python directly guesses the data type of the variable.

In [20]:
# Examples
name = "Ada Lovelace"
a = 1
b = 1.23
c = True
d = type(c)

print(name, a, b, c, d)

Ada Lovelace 1 1.23 True <class 'bool'>


<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
Exchange the value of two variables

x = 5  
y = 2

to,

x = 2  
y = 5
</div>

### Sequences

**Lists**

In [22]:
numbers = [1, 2, 3, 4]
numbers = [1, 2, 3, 'four', True]

In [24]:
numbers[0]

1

In [25]:
numbers[-1]

True

In [29]:
len(numbers)

5

**Tuples**

In [26]:
numbers = (1, 2, 3, 4)
numbers = (1, 2, 3, 'four', True)

<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
What is the difference between list and tuples?

a_list = [1,2,3,4]  
a_tuple = (1,2,3,4)

try to change the first element of both to 'one'

a_list[0] = 'one'
</div>

### Operations

#### Arithmetic operations: `+`, `-`, `*`, `/`, `%`, `**`, `//`

In [31]:
a = 2
b = 3

In [32]:
a+b, a-b

(5, -1)

In [33]:
a*b, a**b

(6, 8)

In [34]:
a/b, a//b

(0.6666666666666666, 0)

In [35]:
a%b

2

#### Comparison operators - `==`, `!=`, `>`, `<`, `>=`, `<=`

In [36]:
a == 2

True

#### Logical and identity operators - `in`, `not in`, `is`, `is not`, `and`, `or`, `not`

In [37]:
a in [1, 2, 3, 4]

True

<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
What is the difference between between "==" and "is"?

a = 500
b = 500

check the values of "a == b" and "a is b"
</div>

#### Dictionaries

A dictionary can be used for storing heterogeneous data and has a key and corresponding entry. It can be defined by

In [38]:
person = {"name": "Maria", "age": 34, "telephone": 23458991 }

In [39]:
person.keys()

dict_keys(['name', 'age', 'telephone'])

In [40]:
person.items()

dict_items([('name', 'Maria'), ('age', 34), ('telephone', 23458991)])

### Control flows

**`if`, `else` and `elif`**

In [42]:
x = 3.5
if x <= 1.0:
    print("low")
elif (x > 1.0) and (x < 3.0):
    print("average")
elif (x >= 3.0):
    print("high")
else:
    print("invalid")

high


**`for`** loops

In [43]:
x = [1,2,3,4,5]

In [44]:
for i in x:
    y = i**2 + 3
    print(y)

4
7
12
19
28


**Controlling loops with `break` and `continue`**

Find first three even numbers up to 10

In [45]:
even_numbers = []
for n in range(1, 10):
    #if its odd
    if (n%2) != 0:
        continue
    even_numbers.append(n)
    if len(even_numbers) == 3:
        break

In [46]:
even_numbers

[2, 4, 6]

<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
Write a loop which calculates first 6 terms of the Fibonacci sequence

term1 = 0  
term2 = 1
    
for x in range..
</div>

### Functions

Functions are an integral part of any programming language. A function is used to take some values, do a task and return the required information.

In [None]:
def add_numbers(x, y):
    """
    adds two numbers
    """
    s = x+y
    return s

In [None]:
add_numbers(2, 3)

The above function uses two values as input, `x` and `y`. They are the **arguments** of the function. It returns a calculated value `s`, which is the **return value**. A function can also not retun any values, or have **keyword arguments**.

In [47]:
def add_numbers(x, y=3):
    """
    adds two numbers
    """
    s = x+y
    return s

In [None]:
add_numbers(2), add_numbers(2, y=4)

<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
Convert the code you wrote for Fibonacci sequence to a function
</div>

### Classes

`Class` defines an object and attaches several and attributes and functions to it. Let us build a very simple example. A circle can be defined by its radius, from which other attributes can be derived such as circumference and area. 

Before we start, its beneficial to think what attributes a class `Circle` might/should have. You can have class attributes and associated functions. A good example of an attribute for example, is `radius`. An associated function could be `calculate_area`, which would calculate the area of the circle since we know the `radius` already.

In [48]:
class Circle:
    """
    Circle class to hold properties of a circle
    """
    def __init__(self, radius=None):
        
        self.radius        = radius
        #other variables are set to None
        self.area          = None
        self.circumference = None
    
    def get_area(self):
        """
        Calculate area
        """
        self.area = np.pi * self.radius**2

    def get_circumference(self):
        """
        Calculate circumference
        """
        self.circumference = 2.0 * np.pi * self.radius

In [5]:
small_circle = Circle(radius=8)

In [6]:
big_circle = Circle(radius=24)

Calculate the area and circumference of the Circle

In [7]:
small_circle.get_area()
small_circle.get_circumference()

You can access the class attributes through its object

In [8]:
small_circle.area

201.06192982974676

In [9]:
small_circle.circumference

50.26548245743669

In [10]:
small_circle.radius

8

## Using libraries

The major strength of the python ecosystem are libraries. Python provides a number of libraries which a person can import and use. 

### Numpy

Numpy offers a lot of useful tools for all aspects of science and statistics

In [15]:
import numpy as np

Numpy arrays are faster and easier to handle - but they should be homogeneous!

In [16]:
A = np.array([1,2,3,4,5,6]) 

**Mathematical operations on numpy arrays**

Mathematical operations on numpy arrays are different from those on lists. They are vectorized.

In [38]:
A = np.ones(3)
A

array([1., 1., 1.])

In [39]:
B = A + A
B

array([2., 2., 2.])

The individual elements are added. But the lengths of course have to be same

In [40]:
C = np.sqrt(B)
C

array([1.41421356, 1.41421356, 1.41421356])

In [41]:
D = B*B
D

array([4., 4., 4.])

### Pandas

Pandas or Python Data Analysis Library is one of the most useful tools for working with tabular data in python. The central aspect in pandas is a DataFrame. A DataFrame is a 2-dimensional data structure that can store data of different types. 

In [49]:
import pandas as pd

Use pandas to read a csv file

In [65]:
df = pd.read_csv("ti_alloy_data.csv")

In [64]:
df

Unnamed: 0,Ti,Cu,Publication,YM,Phase
0,100.0,0.0,j.jmbbm.2010.06.007,105,Ti-alpha
1,96.19,3.81,10.1016/j.matdes.2013.10.050,115,Ti-alpha
2,94.56,5.44,10.1016/j.matdes.2013.10.050,118,Ti-alpha
3,88.27,11.73,10.1016/j.matdes.2013.10.050,120,Ti-alpha


Some useful commands

In [52]:
df.head(), df.tail(), df.columns, df.shape

Unnamed: 0.1,Unnamed: 0,id,eid,formula,R1,R2,R3,F1,F2,F3,...,HV,HV_err,moe,moe_class,ph1,ph2,ph3,condition,comments,composition
0,0,1,Ti-23Nb-10Zr-2.0Fe,Ti0.7791Fe0.0234Zr0.0617Nb0.1358,1,10.1007/s10853-021-06002-0,salvador2021,1,0,0,...,252.0,8.0,12.88,meta,Ti-beta,,,ST-WQ,Young modulus obtained via tensile tests (Tabl...,Ti0.7791 Fe0.0234 Zr0.0617 Nb0.1358
1,1,2,Ti-27Nb-10Zr-1.5Fe,Ti0.7575Fe0.0156Zr0.0637Nb0.1632,1,10.1007/s10853-021-06002-0,salvador2021,1,1,0,...,229.0,2.0,11.65,meta,Ti-beta,,,ST-WQ,Young modulus obtained via tensile tests (Tabl...,Ti0.7575 Fe0.0156 Zr0.0637 Nb0.1632
2,2,3,Ti-31Nb-10Zr-1.0Fe,Ti0.7298Fe0.0138Zr0.0664Nb0.1899,1,10.1007/s10853-021-06002-0,salvador2021,1,0,0,...,218.0,5.0,12.08,meta,Ti-beta,,,ST-WQ,Young modulus obtained via tensile tests (Tabl...,Ti0.7298 Fe0.0138 Zr0.0664 Nb0.1899
3,3,4,Ti-11Nb-7Zr-3.5Fe,Ti0.8622Fe0.0349Zr0.041Nb0.0618,2,10.1016/j.matdes.2018.10.040,dalbo2018,1,1,0,...,358.0,3.0,13.78,meta,Ti-beta,Ti-omega,,ST-WQ,Young modulus obtained via tensile tests (Tabl...,Ti0.8622 Fe0.0349 Zr0.041 Nb0.0618
4,4,5,Ti-19Nb-6Sn-2.5Fe,Ti0.8376Fe0.0256Nb0.11Sn0.0268,2,10.1016/j.matdes.2018.10.040,dalbo2018,1,1,0,...,260.0,9.0,12.76,meta,Ti-beta,Ti-omega,,ST-WQ,Young modulus obtained via tensile tests (Tabl...,Ti0.8376 Fe0.0256 Nb0.11 Sn0.0268


<div class="admonition note" name="html-admonition" style="background: #FFEDD1; padding: 10px">
<p class="title"><b>Task</b></p>
Can you guess what is the data in the csv file that you read in?
</div>

In [54]:
df['YM'].max()

157.0

In [55]:
df['YM'].min()

45.0

In [57]:
df['YM'].idxmax()

131

In [60]:
df.iloc[131]['formula']

'Ti0.6975Cr0.0221Fe0.0722Zr0.2082'

**References**

- https://hida-datathon.github.io/2023-01-11-helmholtz-online/
- https://docs.python-guide.org/