# Quality Prediction in Iron Ore Mining

Our Aim is to predict the percentage of silica in the end of the mining process of the iron ore

### Importing the Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Importing the Dataset

In [None]:
df = pd.read_csv("../input/MiningProcess_Flotation_Plant_Database.csv",decimal=",",parse_dates=["date"],infer_datetime_format=True).drop_duplicates()

### A basic analysis of dataset

In [None]:
df.head()

* In the dataset we have to predict the  **% Silica Concentrate**
* Silica Concentrate is the impurity in the iron ore which needs to be removed
* The current process of detecting silica takes many hours.
* With the help of some analysis and modelling of data we can give a good approximation of silica concentrate which will reduce a lot of time and effort required for processing iron ore

In [None]:
df.shape

In [None]:
df = df.dropna()
df.shape

Great! So we can see that there are no null values in the dataset.

In [None]:
df.describe()

A basic decsription of the dataset

In [None]:
plt.figure(figsize=(30, 25))
p = sns.heatmap(df.corr(), annot=True)

Above plot shows the correaltions between the features.
From the plot we can find out the features which affects the % Silica Concentrate the most

### Preparing the Dataset

Now we will have to drop those features which are not useful for us

In [None]:
df = df.drop(['date', '% Iron Concentrate', 'Ore Pulp pH', 'Flotation Column 01 Air Flow', 'Flotation Column 02 Air Flow', 'Flotation Column 03 Air Flow'], axis=1)

In [None]:
df.head()

In [None]:
Y = df['% Silica Concentrate']
X = df.drop(['% Silica Concentrate'], axis=1)

### Scaling the features

In [None]:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

In [None]:
X_scaled = pd.DataFrame(min_max_scaler.fit_transform(X), columns=X.columns)

### Splitting the Data

Now we will split data into train and test set

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X_scaled, Y, test_size=0.3, random_state=42)

### Training a Model

#### Using Linear Regression

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
reg = LinearRegression()

In [None]:
_ = reg.fit(X_train, Y_train)

In [None]:
predictions = reg.predict(X_test)
predictions

Finding Mean Squared Error

In [None]:
from sklearn.metrics import mean_squared_error

In [None]:
error = mean_squared_error(Y_test, predictions)
error

#### Using Stochastic Gradient Descent

In [None]:
from sklearn.linear_model import SGDRegressor

In [None]:
reg_sgd = SGDRegressor(max_iter=1000, tol=1e-3)

In [None]:
_ = reg_sgd.fit(X_train, Y_train)

In [None]:
predicitons_sgd = reg_sgd.predict(X_test)

Finding Mean Squared Error

In [None]:
error_sgd = mean_squared_error(Y_test, predicitons_sgd)
error_sgd