<a href="https://colab.research.google.com/github/murraycoding/Artificial-Intelligence/blob/main/Copy_of_wine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Wine Classification Project

## Imports

In [1]:
# tensorflow version to use
%tensorflow_version 2.x

# imports
import tensorflow as tf
import pandas as pd
import math

## Data Prep

### Loading the data
In this section, we will take the csv files from my GitHub account and read them in as csv files with pandas. From there, we can convert the csv files into traditional Panads dataframes.

In [2]:
# urls of the data
red_url = 'https://raw.githubusercontent.com/murraycoding/Artificial-Intelligence/main/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/murraycoding/Artificial-Intelligence/main/winequality-white.csv'
# open the urls with pandas
red_csv = pd.read_csv(red_url, sep=';')
white_csv = pd.read_csv(white_url, sep=';')
# change to dataframes
red_df = pd.DataFrame(red_csv, dtype='float64')
white_df = pd.DataFrame(white_csv, dtype='float64')

### Preparing the data
In the code below, the quality column from the original CSV will be removed and replaced by the color of the wine. In this example, we will attempt to use machine learning to predict the color of the wine based on a number of factors.

In [3]:
# replacing the last column with the color of the wine
red_df = red_df.assign(color=0)
white_df = white_df.assign(color=1)

# determines the number of training data points from the dataset to take
num_eval = 300

# gets the evaluation data out of the original wine data
red_df_eval = red_df[:num_eval]
red_df_train = red_df[num_eval:]
white_df_eval = white_df[:num_eval]
white_df_train = white_df[num_eval:]

# one new combined dataframe with both the red and the white wine data (both training and evaluation data)
wine_df_train = pd.concat([red_df_train,white_df_train], ignore_index=True)
wine_df_eval = pd.concat([red_df_eval,white_df_eval], ignore_index=True)
wine_df_train.reset_index()
wine_df_eval.reset_index()

# separates the data into data and results
wine_df_train_result = wine_df_train['color']
wine_df_train_data = wine_df_train.drop(columns=['color','quality'], axis=1)
wine_df_eval_result = wine_df_eval['color']
wine_df_eval_data = wine_df_eval.drop(columns=['color','quality'], axis=1)

print(wine_df_eval_data)
# print(wine_df_eval_data)
print(red_df_train)
# print(red_df)

     fixedacidity  volatileacidity  citricacid  ...    pH  sulphates  alcohol
0             7.4             0.70        0.00  ...  3.51       0.56      9.4
1             7.8             0.88        0.00  ...  3.20       0.68      9.8
2             7.8             0.76        0.04  ...  3.26       0.65      9.8
3            11.2             0.28        0.56  ...  3.16       0.58      9.8
4             7.4             0.70        0.00  ...  3.51       0.56      9.4
..            ...              ...         ...  ...   ...        ...      ...
595           6.3             0.33        0.27  ...  3.37       0.54      9.4
596           8.3             0.39        0.70  ...  3.09       0.57      9.4
597           7.2             0.19        0.46  ...  3.19       0.60     11.2
598           7.5             0.17        0.44  ...  3.17       0.45     10.0
599           6.7             0.17        0.50  ...  3.15       0.45     10.3

[600 rows x 11 columns]
      fixedacidity  volatileacidity  ci

## Training the model
In this section, we will train the model we will use to make predictions on the evaluation data selected from the csv data set at the start of the problem.

### Input Function
This is a pretty general input function from tensorflow.

In [4]:
def input_fn(features, labels, training=True, batch_size=256):
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

### Feature Columns
In this section of code the feature columns are determined in to pass along to the input function and the estimator model from TensorFlow.

In [5]:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in wine_df_train_data.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
print(my_feature_columns)

[NumericColumn(key='fixedacidity', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='volatileacidity', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='citricacid', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='residualsugar', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='chlorides', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='freesulfurdioxide', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='totalsulfurdioxide', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='density', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='pH', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='sulphates', shape=(1,), default_value=No

### Building the model
We are now ready to choose a model. For classification, there are a variety of different models that we can pick from. In this case, we will be using the DNNClassifier (Deep Neural Network). 

In [7]:
my_head = tf.estimator.LogisticRegressionHead()
# Build a DNN with 2 hidden layers with 25 and 15 hidden nodes each.
classifier = tf.estimator.DNNEstimator(
    head = my_head,
    feature_columns=my_feature_columns,
    # Two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[25, 15])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpbzgtqlfs', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### Training
Now it's time to train the model!

In [8]:
tf.keras.backend.set_floatx('float64')

classifier.train(
    input_fn=lambda: input_fn(wine_df_train_data, wine_df_train_result, training=True),
    steps=5000)

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float32 by default, call `tf.keras.backend.set_floatx('float32')`. To change just this layer, pass dtype='float32' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpbzgtqlfs/model.ckpt.
INFO:tensorflow:Calling checkpoint listener

<tensorflow_estimator.python.estimator.canned.dnn.DNNEstimatorV2 at 0x7ff6ca284630>

In [9]:
eval_result = classifier.evaluate(input_fn=lambda: input_fn(wine_df_eval_data, wine_df_eval_result, training=False))
print(eval_result)

INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float32 by default, call `tf.keras.backend.set_floatx('float32')`. To change just this layer, pass dtype='float32' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-11-05T18:53:29Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpbzgtqlfs/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.26767s
INFO:tensorflow:Finished evaluation at 2020-11-05-18:53:29
INFO:tensorflow:Saving dict for global step 5000: average_loss = 0.5623307212754722, global_step = 5000, label/mean = 0.5, loss = 0.4496868, prediction/mean = 0.6967617181936899
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/tmpbzgtqlfs/mode

### Using the model
We will now use the model by allowing the user to input a series of statistics about a new wine. The model will then give a prediction about if the wine being tested in red or white.

In [None]:
# creates a new input function just for predictions
def predict_input_fn(features, batch_size=256):
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

# wine types
wine_type = ('white','red')

# wine dictionary to store user input
user_wine = {}

for feature in my_feature_columns:
    # asks the user for input
    value = input(f'{feature[0]} = ')
    user_wine[feature] = [float(value)]

predictions = classifier.predict(input_fn=lambda: predict_input_fn(user_wine))
for predict_dict in predictions:
    # print(predict_dict)
    print(f"Chance of white wine: {round(predict_dict['probabilities'][0]*100,1)}%")
    print(f"Chance of red wine: {round(predict_dict['probabilities'][1]*100,1)}%")


fixedacidity = 2
volatileacidity = 2
citricacid = 2
residualsugar = 2
chlorides = 2
freesulfurdioxide = 2
totalsulfurdioxide = 2
density = 2
pH = 2
sulphates = 2
alcohol = 2
INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float32 by default, call `tf.keras.backend.set_floatx('float32')`. To change just this layer, pass dtype='float32' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpnb7jweop/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Chance of white wine: 78.5%
Chance of red wine: 21.5%
