<p align="center">
  <img src="img/GMITLOGO.jpg" width="500"/></p>

# GMIT, H.Dip in Data Analytics, Programming for Data Analysis Project 2018

## 1. Introduction

This repository contains all of the files pertaining to my 2018 project submission for the Programming for Data Analysis module of the GMIT H.Dip program in Data Analytics. All of the work contained within this repository was carried out over the course of a 4 week period in November and December 2018. This Jupyter notebook [1] contains the complete documentation for the project.

### 1.1 Project objective

The objective of this project is to sythesise a data set based on some real world phenomenon. This requires investigation in to the pheonmenon and then using  the `numpy.random` package in Python [2] to simulated some data based on this. The problem statement for the assignment is as follows [3]:

1. Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.
1. Investigate the types of variables involved, their likely distributions, and their relationships with each other.
1. Synthesise/simulate a data set as closely matching their properties as possible.
1. Detail your research and implement the simulation in a Jupyter notebook – the data set itself can simply be displayed in an output cell within the notebook.

### 1.2 Choice of real world phenomenon

I am currently working in operations at a malt production facility in Ireland. Malting is a process whereby raw cereal grains (barley mainly) are re-hydrated to siumlated planting in soil, and then let germinate for a number of days. During this germination process the cellular structure of the grain is modified, releasing sugars. The grain is then kilned to stop the germination process while drying the grain, fixing the properties of the grain, and improving the colour and flavour characterisitcs of the final product. 

Worldwide, the largest consumers of malted barley are the brewing and distilling industries [4], using 96% of all malt produced. The most important property of any brewing or distilling malt is its *extract potential* which determines how much sugar is available for conversion to alcohol in the brewing (beer production)/distilling (whiskey production) process [5]. Distillers are particularly interested in *spirit yield* to determine how much many litres of alcohol they are likely to get from each tonne of malt purchased.

The maltster's role in this process is to take a natural raw material and create a consistent product, which maximises the extract potential and spirit yield for their customers. There are many factors in the process which can influence the final extract potential and spirit yield. For this project we are only going to consider the relationship between the protein levels in the raw barley and the final extacts and predicted spirit yields (PSY).

<p align="center">
  <img src="img/maltingbarley.jpg" width="500"/></p>
   <p style="text-align: center;"> <b><I><a href="https://www.bratney.com/industries/malting--brewing">Malted Barley</a></I></b> </p>  


### 1.3 Plan for the project

The plan for this project is as follows:
1. Section 2 will give an a brief introduction to the malting process, and define the important variables for this project
1. In Section 3 I will analyse an existing dataset to summarise the distributions of the various variables and the relationships between
1. In Section 4 I will systhesise a dataset using the relationship and distributions defined in Section 3
1. Section 5 will cover some rudimentary analysis of the sythesised dataset
1. The project will be concluded in Section 6

## 2. Overview of the Malting Process
### 2.1 Introduction to Malting

In this section I will outline a brief overview of the malting process. This should give a broad understanding of where the data set comes from, how the variables in it are expected to relate to one another, and how these relationships may be influenced by other factors. 

A process flow diagram for the process is shown below: 

<p align="center">
  <img src="img/process.png" width="850"/></p>
  <p style="text-align: center;"> <b><I>Malting Process Flow Diagram</I></b> </p> 


### 2.2 Barley Growing and Harvesting
The first and perhaps most important stage in the process is the barely growing and harvesting stage. The the agronomic program given to a field of barley during its growing season can help determine the level of nitrogen and hence protein in the grain. In Ireland, Spring malting barley is typically sown in March and harvested in August [7]. During this growing season, the grower is responsible for ensuring that the agonomic program used optimises the nitrogen content in the grain while balancing this with a good yield per acre sown. In Ireland,  there is a requirement for approximately 70% brewing barley and 30% distilling barely for any given harvest year. The maximum protein level allowed for distilling barely is 9.3% [8]. Barley for brewing can be accepted up to 10.8% [6]. During the growing season the farmer will apply a nitrogen based fertiliser to improve the growth in the barley. This will improve the yield per acre for the farmer. However, if too much fertiliser is applied (or it applied too late in the season) the nitrogen levels in the harvested barley will be higher [9], [10] . This barely will be unsuitable for making distilling malt, and may even be rejected for brewing if the protein levels are too high.

<p align="center">
  <img src="img/spraying.jpg" width="650"/></p>
   <p style="text-align: center;"> <b><I><a href="https://www.agriland.ie/farming-news/new-fertiliser-rules-winter-barley/">Barley Spraying</a></I></b> </p>  
   
Another aspect that impacts protein levels in the grain is the weather during the growing season. When the plants are growing the nitrogen levels in the corn is higher earlier in the growing season. The nitrogen comes from the soil and the fertiliser applied. Later in the season as the barley ripens, ready for harvesting, more starch arrives in the grains as they increase in size. If the barley growing season is too dry, the barley will ripen early before there is time for the grains to fill out with starch. This will result in higher protein barley.   

### 2.3 Barley Intake, Drying and Storage
Once the grower believes that the barley is ready to be harvested, a sample is taken and tested for moisture and protein. Once the moisture and protein are in specification as agreed with the maltster, the farmer will proceed to cut the barley. When the barley arrives at the intake, it is tested for moisture and protein. The maltster will then segregate the barley based on the variety and protein content. Some stores will be allocated to ditilling barely (lowest protein), while others will be allocated to brewing. Most years there are 4 to 5 different barley varieties sown, as they each have different properties. This allows the maltster to blend the correct mix of them together to ensure the malt is right for brewing and distilling. 

<p align="center">
  <img src="img/sampling.jpg" width="550"/></p>
   <p style="text-align: center;"> <b><I><a href="http://www.ukmalt.com/barley-requirements">Barley Sampling at Intake</a></I></b> </p> 

The moisture level in the freshly cut barley is usually in the range of 15-21%. Barley at this moisture level is unsafe for long term storage as the grain will start to respire in storage and generate heat. This heat is conducive to microbial growth [11]. Grain at this moisture is also susecptible to mite infestation. Therefore the barely must be dried to below 13% moisture for safe storage. All barley storage areas have aeration systems that are used to cool the barley to below 13<sup>o</sup>C. This is done to prevent hot spots forming in the grain bulk. Note that barley may often be have to stored safely for up to 18 months before some of it used.

### 2.4 Steeping
### 2.5 Germination
### 2.6 Kilning
### 2.7 Analysis


## 3. Data Set
### Intro to my data set
### To include - Malt Type, Barley Variety, Percentage Protein, PSY, Extract
### Analyse real data to understand the relationship between Protein and PSY/ Extract per Barley Variety
### Record the relationships

## 4. Data Simulation
### Generate 200 Data points
### Split between brewing/distlling to remain as the original data set
### Barley variety per malt type to be the same
### Brewing / Distilling TPd to be distributed as per our original set
### Relationships between TPd and PSY and EX to be simulated based on Step 3

## 5. Discussion

## 6. Conclusion


## References

1. [1] Project Jupyter. Project Jupyter Home. (_https://www.jupyter.org/_)
1. [2] Numpy Deveolpment Team. Numpy Random Sampling (`numpy.random`). (_https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.random.html_)
1. [3] Dr. Ian McLoughlin. GMIT. Project 2018: Programming for Data Analysis. (*https://github.com/ianmcloughlin/progda-project-2018/raw/master/project.pdf*)
1. [4] Ivor Murrell. Malt, Unravelling the Mystery. (*http://www.ukmalt.com/malt-unravelling-mystery*)
1. [5] The Institute of Brewing and Distilling. General Certificate in Malting (pp. 68-69).  (*http://www.ibdlearningzone.org.uk/article/show/pdf/1126/*) 
1. [6] The Maltsters' Association of Great Britain. Barley Requirements.  (*http://www.ukmalt.com/barley-requirements*)
1. [7] The Institute of Brewing and Distilling. General Certificate in Malting (pp 16).  (*http://www.ibdlearningzone.org.uk/article/show/pdf/1126/*) 
1. [8] www.agriland.ie. New Boortmalt/IFA malting barley price arrangement announced (*https://www.agriland.ie/farming-news/new-boortmalt-ifa-price-arrangement-announced/*)
1. [9] www.yara.co.uk. How to influence barley grain quality. (*https://www.yara.co.uk/crop-nutrition/barley/influencing-barley-grain-quality/*)
1. [10] www.teagasc.ie The Spring Barley Guide. (*https://www.teagasc.ie/media/website/publications/2015/The-Spring-Barley-Guide.pdf*)
1. [11] The Institute of Brewing and Distilling. General Certificate in Malting (pp. 23-24).  (*http://www.ibdlearningzone.org.uk/article/show/pdf/1126/*) 