# Synthesis of New Materials in Industry Based on Machine Learning

The goal of this lab is to build a neural network capable of predicting the behavior of new materials from a set of known materials. Deeper insights can be found in this [paper](https://roboticsmind.github.io/public/research/aaai-2021/osmani2021augmented.pdf) but we will restrict to reproduce the baseline model.

## Application Description

<p align="center">
<img src="https://user-images.githubusercontent.com/8298445/107156058-5baf2c00-697c-11eb-95a2-42dbf1db8612.png" height="300px"/>
</p>

**Red Pigment** (the first known material in our application) is a natural form of mineral composed mainly of iron oxide;

**Calamine Oxide** (the second known material in our application) is a steel by-product obtained during continuous casting or heating of slabs and billets;

**The synthesis of new materials** is obtained by the contribution of the calamine in this process by ensuring a sufficient quantity of $Fe_2O_3$ and increasing the density of the synthesized pigment.
The goal is to get materials with some desirable qualitative properties including optical properties, ferromagnetic properties, etc.

## Dataset Description

* Dataset consists of thermal analysis of raw materials collected with an SDT-Q600 version 20.9 build 20 industrial instrument that monitors the calcination of the mixtures;
* Various signals are monitored by the instrument, including, temperature (°C), weight (mg), heat flow (mW), temperature difference ($\mu V$), sample purge flow (mL/min), etc.;
* In addition to the theoretical curves of the red pigment ($pig$) and the calamine oxide ($cala$) that were obtained separately, we perform calcination of mixtures with various percentages, $p_i\in\{5, 10, 15, 20, 25, 35\}$, of additional calamine oxide to the red pigment.


After performing integration with your drive and GitHub, you will find the data inside `./data/` folder. It has the following structure:
```
./data/
    ├── ATG10pc.csv
    ├── ATG15pc.csv
    ├── ATG20pc.csv
    ├── ATG25pc.csv
    ├── ATG35pc.csv
    ├── ATG5pc.csv
    ├── ATGPig.csv
    └── ATGcala.csv
```

Visualization of the quantities being monitored in this application can be found bellow (after loading the dataset). Here is the evolution of the monitored signals for the Red Pigment.

<p align="center">
<img src="https://user-images.githubusercontent.com/8298445/107157765-3d4e2e00-6986-11eb-936b-7ea98561c449.png" title="Red Pigment --- Evolution of the Calcination Process" height="300px"/>
<br/>
<b>Figure:</b> Simultaneous thermal and mass loss analysis of (a)red pigment and (b, c) binary mixture of red pigment and ad-ditional calamine percentages. The effect of the temperatureaugmentation on the behavior of the red pigment is shownvia  weight,  derivative  weight,  temperature  difference,  andheat flow curves.
</p>

## Integration between Google Drive --- Colab --- Github

**Ex. 1** Perform the integration between Drive, Colab, and GitHub.
The repository that has to be cloned is at this address: `https://github.com/roboticsmind/2021-material-engineering`.

After connecting to the drive, use the address of the repository to clone it.

**Ex. 2** Go inside the cloned repository and check that everything has been correctly cloned. Be aware of the difference between `!cd` and `%cd`.

## Loading Libraries

**Ex. 3** Load _tensorflow_, _keras_, and _numpy_ libraries

**Ex. 4** Here, we want to master the randomness that certain libraries rely upon in order to be able to reproduce the results (without the side effects related to randomness). Define a seed and use it to initialize _numpy_'s and _tensorflow_'s random generators. For tensorflow see [tf.random](https://www.tensorflow.org/api_docs/python/tf/random/set_seed).


## Loading and Visualizing the Dataset

In [None]:
# load data
curves = [
  'ATGPig',
  'ATG5pc',
  'ATG10pc',
  'ATG15pc',
  'ATG20pc',
  'ATG25pc',
  'ATG35pc',
  'ATGcala'
]
columns = [
  'time',
  'temperature',
  'weight',
  'heat_flow',
  'temperature_difference_C',
  'temperature_difference_V',
  'Sample_purge_flow',
  'Unknown'
]

**Ex. 5** Load `pandas` library and use the function `read_csv()` to read one of the `.csv` files contained in `data/`. Provide the necessary parameters (check the documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). In particular, take care of `delimiter`, `header`, and `names`).

**Ex. 6** `read_csv()` returns a `DataFrame`. Check the documentation of `DataFrame` [here](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) and display its contents.

**Ex. 7** Write a function that loads the `.csv` files contained in `data/` folder and returns a dictionary having the columns, defined above, as keys.

**Ex. 8** Take a look inside the returned dictionary.

**Ex. 9** plot the different curves using the plot utilities provided by `matplotlib`. (Check the documentation of `matplotlib` [here](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.plot.html)).

## Data Preparation

### Reconstruction of ATG35pc with f(ATGPig, ATGcala)

Consider the following figure which depicts the state space defined by the experiments (calcination) conducted at different percentages of additional calamine oxide. The axe $u(t)$ corresponds to the temperature applied to the mixture.

<p align="center">
<img src="https://user-images.githubusercontent.com/8298445/107158505-eb5bd700-698a-11eb-895e-1439281b9c3f.png" height="200px"/>
</p>

In this lab, we will use ATGpig (0%) and ATGcala (100%) in order to reconstruct (or predict) the behavior of the mixture ATG35pc (35% of additional calamine oxide).


**Ex. 10** In the following cell, define two variables `INPUT_PERCENTAGES` and `TARGET_PERCENTAGE` which should contain the input and output percentages of the neural network.

**Ex. 11** The target quantities (outputs of our model) are _weight_ and _temperature_ (or _temperature difference_). In the other hand, the input quantities to our neural network will be _weight_, _heat flow_, _temperature difference_, and _sample purge flow_.

In addition to the input and output percentages you defined above, define two other variables `INPUTS` and `OUTPUTS` which should contain the quantities of interest used in the neural network.

**Ex. 12** Define a function called `get_data()` which takes as input a pandas dataframe and returns 2 dictionaries (`train_inputs` and `train_outputs`) which keys have the following format (according to the variables defined above).

* for `train_inputs`:
```python
train_inputs={
    # pig
    'ATGpig_weight': [...data here...],  
    'ATGpig_heat_flow': [...data here...],
    'ATGpig_temperature_difference_C': [...data here...],
    'ATGpig_Sample_purge_flow': [...data here...],

    # cala
    'ATGcala_weight': [...data here...],  
    'ATGcala_heat_flow': [...data here...],
    'ATGcala_temperature_difference_C': [...data here...],
    'ATGcala_Sample_purge_flow': [...data here...],
}
```

* for `train_outputs`:
```python
train_outputs={
    # ATG35pc
    'ATG35pc_weight': [...data here...],
    'ATG35pc_temperature_difference_C': [...data here...],
}
```

**Ex. 13** Test your implementation and check the contents of the resulting dictionaries.

**Ex. 14** Using `model_selection` of `sklearn` ([see documentation here]()) generate two indexes that will be used to split the data points of a given curve into two parts (75 for train and 25 for validation).

**Ex. 15** Modify the function `get_data()` you defined above to return 4 dictionaries (`train_inputs`, `train_outputs`, `valid_inputs`, and `valid_outputs`) rather than 2 initially. Use ths splits you constructed above with `ShuffleSplit`.

**Ex. 16** Print the shapes of the dictionaries returned by `get_data()`. Ensure that the sizes correspond to the splitting ratio (75/25) we asked you to use.

**Ex. 17** Propose some data visualisation

**Ex. 18** Propose a simple model to solve the given problem
