#**Recomendación tipo de cultivo**

Generaremos un modelo de máquina de soporte vectorial SVM para recomendar un tipo de cultivo dependiendo de la información del suelo y ambiental. El modelo luego se pondrá en producción usando google cloud y streamlit.

## **Dataset:**

El dataset usado se denomina "Crop Recommendation Dataset" y se encuentra público en kaggle. Puedes encontrarlo en el siguiente enlace:

[Dataset Kaggle](https://www.kaggle.com/atharvaingle/crop-recommendation-dataset?select=Crop_recommendation.csv)

## **Variables independientes:**

*   Nitrógeno
*   Fósforo
*   Potasio
*   Temperatura
*   Humedad
*   pH
*   Lluvia


# 1. Instalación scikit-learn

In [None]:
!pip install scikit-learn==0.24.1

Collecting scikit-learn==0.24.1
  Downloading scikit_learn-0.24.1-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
[K     |████████████████████████████████| 22.3 MB 1.6 MB/s 
Installing collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.0.2
    Uninstalling scikit-learn-1.0.2:
      Successfully uninstalled scikit-learn-1.0.2
Successfully installed scikit-learn-0.24.1


# 2. Importamos librerias

In [None]:
import numpy as np
from sklearn import svm
import pandas as pd
import pickle
from sklearn.model_selection import train_test_split
import sklearn
sklearn.__version__

'0.24.1'

In [None]:
!python3 --version

Python 3.7.12


# 3. Cargamos dataset

In [None]:
df= pd.read_csv("/content/Crop_recommendation.csv")
df.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,rice
1,85,58,41,21.770462,80.319644,7.038096,226.655537,rice
2,60,55,44,23.004459,82.320763,7.840207,263.964248,rice
3,74,35,40,26.491096,80.158363,6.980401,242.864034,rice
4,78,42,42,20.130175,81.604873,7.628473,262.71734,rice


In [None]:
df.describe()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
count,2200.0,2200.0,2200.0,2200.0,2200.0,2200.0,2200.0
mean,50.551818,53.362727,48.149091,25.616244,71.481779,6.46948,103.463655
std,36.917334,32.985883,50.647931,5.063749,22.263812,0.773938,54.958389
min,0.0,5.0,5.0,8.825675,14.25804,3.504752,20.211267
25%,21.0,28.0,20.0,22.769375,60.261953,5.971693,64.551686
50%,37.0,51.0,32.0,25.598693,80.473146,6.425045,94.867624
75%,84.25,68.0,49.0,28.561654,89.948771,6.923643,124.267508
max,140.0,145.0,205.0,43.675493,99.981876,9.935091,298.560117


In [None]:
# Obtenemos variables independientes
X = df.drop(["label"],axis = 1)
X.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
0,90,42,43,20.879744,82.002744,6.502985,202.935536
1,85,58,41,21.770462,80.319644,7.038096,226.655537
2,60,55,44,23.004459,82.320763,7.840207,263.964248
3,74,35,40,26.491096,80.158363,6.980401,242.864034
4,78,42,42,20.130175,81.604873,7.628473,262.71734


In [None]:
# Obtenemos variable dependiente
Y = df.pop('label')
Y

0         rice
1         rice
2         rice
3         rice
4         rice
         ...  
2195    coffee
2196    coffee
2197    coffee
2198    coffee
2199    coffee
Name: label, Length: 2200, dtype: object

In [None]:
# Separamos datos para ajuste y prueba
X_train, X_test, y_train, y_test = train_test_split(X, Y, train_size=0.8, test_size=0.2, random_state=100)

# 4. Ajuste del modelo RF

In [None]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train,y_train)

RandomForestClassifier()

In [None]:
# Grabamos el modelo en el directorio
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(model, file)

In [None]:
# Cargamos el modelo
pkl_filename = "pickle_model.pkl"
with open(pkl_filename, 'rb') as file:
    model = pickle.load(file)

# 5. Desempeño del modelo

In [None]:
# Encontramos el accuracy promedio usando datos de test
score = model.score(X_test, y_test)
print(score)

0.9954545454545455


# 6. Probamos con una muestra nueva

In [None]:
x_in = np.asarray([90, 42 ,50, 20.8,80,6,200]).reshape(1,-1)
predicts = model.predict(x_in)
predicts[0]

'rice'