In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# **<span style="color:#DC143C;">Feyn is inspired by Richard Feynman's path integral formulation. That's why the python module to use it is called Feyn, and the Q in QLattice is for Quantum.</span>**

![](https://dogtrainingobedienceschool.com/pic/3665224_full-richard-feynman-hard-work-quotes-feynman-s-hack-and-the-map-of-the-cat-safal-niveshak.jpg)dogtrainingobedienceschool.com

#Code by Casper Wilstrup  https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

<center style="font-family:verdana;"><h1 style="font-size:200%; padding: 20px; background: #001f3f;"><i><b style="color:orange;">Feyn and QLattice</b></i></h1></center>

"Feyn is a Python module for interacting with the QLattice."

"The QLattice is a machine learning technology that helps you search through an infinite list of potential mathematical models to solve your problem."

"It's a quantum-inspired simulation where you make decisions when exploring the data, giving you a good understanding of the relationships in your data and closing the loop between scientific inquiry and data science."

https://docs.abzu.ai/

https://docs.abzu.ai/docs/guides/getting_started/community.html

In [None]:
!pip install feyn

In [None]:
import feyn

In [None]:
data = '/kaggle/input/rainprediction/AusDataForRainPred.csv'
df = pd.read_csv(data)
df.head().style.set_properties(**{'background-color':'purple',
                                     'color': 'white'})

In [None]:
#Code by Casper Wilstrup https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

df["week"] = pd.to_datetime(df.Date).dt.isocalendar().week.astype(int)
df = df.drop("Date", axis=1)

In [None]:
#Code by Casper Wilstrup https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

stypes = {
    "Location": "cat",
    "WindGustDir": "cat",
    "WindDir9am": "cat",
    "WindDir3pm": "cat",
    "RainToday": "cat",
    "week": "cat",
}

In [None]:
df.isnull().sum()

#The target variable

There are over 3000 observations with no data for the target variable. We drop those observations up front

We also notice that the target variable is boolean but expressed as a string. Let us convert it to a proper boolean (0/1)

In [None]:
#Code by Casper Wilstrup https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

df = df.dropna(subset=["RainTomorrow"])
df["RainTomorrow"]=df["RainTomorrow"]=="Yes"

#Drop Missing Values

In [None]:
df = df.drop(["Evaporation", "Sunshine", "Cloud9am", "Cloud3pm"], axis=1)

#Dropping the remainder

There is still a lot of missing values. We will set aside the data with missing values (because we might use this in later extensions of this notebook where we demonstrate how to use the QLattice to impute data).

In [None]:
#Code by Casper Wilstrup https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

na_data = df[df.isna().any(axis=1)].copy() # This dataframe holds observations where any of the values are missing
full_data = df.dropna() # This dataframe holds observations where *none* of the values are missing

In [None]:
full_data

In [None]:
train, test = feyn.tools.split(full_data, ratio=(1,1), random_state=42)

#Community QLattice

We are now ready to connect to the QLattice. The feyn module will look in you local configuration file to see if we have a commercial QLattice. If not, it will allocate a community QLattice for us on the Abzu compute cluster.

In [None]:
ql = feyn.connect_qlattice()

#Reproducibility

The qlattice will be reset when we get it, but to ensure that we get exactly the same result every time we run the notebook we need to seed the QLattice. This is done with the reset method

In [None]:
ql.reset(random_seed=42)

#Search for the best model

We are now ready to instruct the QLattice to search for the best mathematical model to explain the data. Here we use the high-level convenience function that does everything with sensible defaults: https://docs.abzu.ai/docs/guides/essentials/auto_run.html.

For more detailed control, we could use the primitives: https://docs.abzu.ai/docs/guides/primitives/using_primitives.html

Notice that the stypes dictionary we created earlier gets passed to the QLattice here.

NOTE: This will take several minutes to complete. It invoves work done on the QLattice machine remotely as well as in the local notebook. The part that runs locally is slowing things down because of the limited CPU resources on Kaggle. 

In [None]:
models = ql.auto_run(train, output_name="RainTomorrow", kind="classification", stypes = stypes)

#Evaluate

The QLattice has found a mathematical relationship tha can relate the predictors to the output. The final step is to evaluate the model on the test and the train set. To do that we plot the ROC curve of the classifier on both the test and the training data. You can read more about ROC curves here: https://docs.abzu.ai/docs/guides/plotting/roc_curve.html

In this case they overlap almost perfectly which indicates that the model generalizes to unseen data very well.

In [None]:
models[0].plot_roc_curve(train)
models[0].plot_roc_curve(test)

#Confusion matrix

A simpler and less powerfull way to evaluate classifiers is a confusion matrix: https://docs.abzu.ai/docs/guides/plotting/confusion_matrix.html Let us see how that looks at various thresholds

In [None]:
models[0].plot_confusion_matrix(test, threshold=.3)

In [None]:
models[0].plot_confusion_matrix(test, threshold=.5)

#Only by copying Casper code I could have Math in my Notebook.

What the QLattice actually finds is an equation that relates the input to the output. The user can control the compexity and structure of the equation in various ways. Here we just went with the defaults. Let's see the actual mathematical expression:

In [None]:
models[0].sympify(2)

#Feature interaction

Finally we can see how the features interact by plotting the equation with pearson correlations of each node. See more here: https://docs.abzu.ai/docs/guides/plotting/model_plot.html

In [None]:
models[0].plot(test)

#Conclusion by Casper Wilstrup https://www.kaggle.com/wilstrup/use-qlattice-to-predict-rainy-days-in-australia/notebook

In a few simple steps we were able to:

Find a mathematical model that predicts rain

Show that it gereralizes very well to new data.

Understand which features interacts to predict rain

Visualize the performance of the model in various ways.