![ASR_LOGO.png](ASR_LOGO.png "ASR")
<img src=https://www.asranalytics.com/application/files/8015/3910/0037/revhub_logo.png width="110" aling = "center">
<img src=http://localhost:8888/files/ASR_LOGO.png width="110" aling = "center">



# CRISP-DM Python Template



<img src="desktop/crisp-dm-4.png" width = "350" align="right"/>

***
CRISP-DM stands for cross-industry process for data mining. It provides a structured <br> approach to planning a data mining project. It is a robust and well-proven methodology. 

The CRISP-DM model consists of six steps:

1. [Business Understanding](#Businessunderstanding)
2. [Data Understanding](#Dataunderstanding)
3. [Data Preparation](#Datapreparation)
4. [Modelling](#Modelling)
5. [Evaluation](#Evaluation)
6. [Deploymnet](#Deployment)

***

***
## 1. Business Understanding <a class="anchor" id="Businessunderstanding"></a>
<img src=https://www.flickr.com/photos/187949142@N05/49778491857/in/dateposted-public/ width="35" height="30" align="left"/>

Determine business objectives and assess the situation. Include requirements, data security, and constraints. 

#### Tax Type (Sales Tax, Personal Income Tax, etc.)
- <font color=red>User input required</font>

#### Model Type (Schedule Analaysis, Preparer Analysis, Restaurant Sales Tax)
- <font color=red>User input required</font>

#### Requirements
- <font color=red>User input required</font>

#### Security (FTI)
- <font color=red>User input required</font>

#### Constraints
- <font color=red>User input required</font>

***
## 2. Data Understanding <a class = "ancor" id = "Dataunderstanding"></a>  
<img src="desktop/2..png" width="30" height="30" align="left"/>

The initial collection of data required in the project resources includes data loading and data integration if data from multiple sources. 

#### Import Required Libraries

>```python
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import seaborn as sns
```

#### Import Initial Data

>```python
>#Importing data from .csv
>example = pd.read_csv("example.csv")
>```

#### Initial Data Analysis

>```python
example.head()
example.columns()
example.info()
example.describe()
```

***
## 3. Data Preparation <a class = "anchor" id = "Datapreparation"></a>
<img src="desktop/3..png" width="30" height="30" align="left"/>

Data preparation includes additional data acquisition, data cleaning, data integration, and data transformation.

#### Data Acquisition 
From database, files, etc.
>```python
>#Inputting data from .csv
example2 = pd.read_csv("example.csv")
```

#### Data Cleaning 
Identify and fix errors in data, deal with missing data, etc.

>```python
>#Replacing 0s
cols = ['col1', 'col2']
df[cols] = df[cols].replace(0, np.nan)
```

#### Data Integration
Merge data from different sources
>```python
>#Combining two data
df = pd.merge(example, example2)
```

#### Data Transformation and Enrichment
Create new features from existing features
>```python
>#Ratio Variable
Deduction_Ratio = taxdata.AZDeductions/taxdata.AZAGI
taxdata["Deduction_Ratio"] = Deduction_Ratio
>#Indicator Variable
taxdata["Sus_PTIN"] = taxdata.PTIN
taxdata["Sus_PTIN"] = np.where(taxdata["Sus_PTIN"] == 'P00749774', '1', taxdata["Sus_PTIN"])
taxdata["Sus_PTIN"] = np.where(taxdata["Sus_PTIN"] != '1', '0', taxdata["Sus_PTIN"])
```

***
## 4. Modelling <a class = "anchor" id = "Modelling"></a>
<img src="desktop/4..png" width="30" height="30" align="left"/>

Creation of Model and Model Criteria:
- Parameter Settings: <font color=red>User input required</font>
- Model: <font color=red>User input required</font>
- Model Descrition:<font color=red>User input required</font>

#### Creating Training and Testing Sets
>```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) 
```

#### Creating the Model
>```python
># Create Decision Tree classifer object 
clf = DecisionTreeClassifier() 
```

#### Training the Model
>```python
clf = clf.fit(X_train,y_train) 
```

#### Testing the Model
>```python
>#Predict the response for test dataset 
y_pred = clf.predict(X_test) 
X_TEST = [[86556]] 
outcome = clf.predict(X=X_TEST) 
print (outcome) 
```

***
## 5. Evaluation <a class = "anchor" id = "Evaluation"></a>
<img src="desktop/5..png" width="30" height="30" align="left"/>

Interpret the models according to your domain knowledge, your data mining success criteria and your desired test design.

#### Model Analysis
>```python
>#Confusion Matrix Analysis
from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(Y_test, y_pred)
print (matrix)
```

***
## 6. Deployment <a class = "anchor" id = "Deployment"></a>
<img src="desktop/6..png" width="30" height="30" align="left"/>

Prepare the model for deployment within MarkLogic.

#### Converting to .onnx File
>```python
from skl2onnx import convert_sklearn 
from skl2onnx.common.data_types import FloatTensorType 
initial_type = [('float_input', FloatTensorType([None, 4]))] 
onx = convert_sklearn(clf, initial_types=initial_type) 
with open("ZIP_PREDICT.onnx", "wb") as f: 
    f.write(onx.SerializeToString())
```