## 🌾 Machine Learning: Crop Recommendation System

### 📚 Deskripsi Dataset

Pada studi kasus ini, kita akan membuat **sistem rekomendasi tanaman** berbasis **Machine Learning** yang bertujuan membantu petani dalam menentukan jenis tanaman terbaik berdasarkan kondisi lingkungan dan tanah.

Dataset yang digunakan berisi **data agrikultur** dengan berbagai parameter penting yang mempengaruhi pertumbuhan tanaman. Data ini mencakup **unsur hara tanah (N, P, K)**, **suhu**, **kelembaban udara**, **tingkat keasaman tanah (pH)**, dan **curah hujan**. Berdasarkan parameter-parameter ini, sistem akan memberikan **rekomendasi jenis tanaman** yang paling sesuai.

Berikut adalah deskripsi dari setiap kolom dalam dataset:

| **Kolom**        | **Tipe Data**          | **Deskripsi**                                                                                     |
|------------------|------------------------|---------------------------------------------------------------------------------------------------|
| **N**            | `int`                  | Kandungan Nitrogen dalam tanah, diukur dalam satuan mg/kg.                                       |
| **P**            | `int`                  | Kandungan Phosphorus dalam tanah, diukur dalam satuan mg/kg.                                     |
| **K**            | `int`                  | Kandungan Potassium dalam tanah, diukur dalam satuan mg/kg.                                      |
| **temperature**  | `float`                | Suhu lingkungan tempat tanaman tumbuh, diukur dalam derajat Celcius (°C).                       |
| **humidity**     | `float`                | Kelembaban udara di lingkungan tumbuh, diukur dalam persen (%).                                  |
| **ph**           | `float`                | Tingkat keasaman tanah (pH), menunjukkan kondisi asam atau basa pada tanah.                      |
| **rainfall**     | `float`                | Curah hujan tahunan di wilayah tanam, diukur dalam milimeter (mm).                               |
| **label**        | `category` / `string`  | Jenis tanaman yang direkomendasikan untuk ditanam berdasarkan parameter yang ada.                |

---

### 🚀 Workflow Proyek
1. **DataFrame Read**  
2. **Pra-Pemrosesan (Preprocessing)**  
3. **Pemodelan Machine Learning (ML Model Building)**  
4. **Evaluasi dan Validasi Model**  
5. **Perbandingan Eksekusi CPU vs GPU**  
6. **Kesimpulan & Rekomendasi Implementasi**

---

### 🔧 Langkah-Langkah Penerapan Machine Learning


---


## CPU Dataframe Read

In [1]:
%%time
import pandas as pd

df = pd.read_csv("./synthetic_crop.csv")
df.head()

CPU times: user 16.7 s, sys: 582 ms, total: 17.2 s
Wall time: 4.36 s


Unnamed: 0.1,Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,0,82,43,41,22.627878,84.111503,6.4742,208.148764,rice
1,1,105,26,47,26.989249,94.0838,5.918236,32.816282,muskmelon
2,2,70,60,25,18.751657,21.243104,5.808209,82.755315,maize
3,3,34,65,82,19.696618,14.295643,7.977798,62.23801,maize
4,4,94,36,46,27.210002,90.658402,6.042651,115.917395,watermelon


## GPU Dataframe Read

In [2]:
%%time
import cudf

df_cudf = cudf.read_csv("./synthetic_crop.csv")
df_cudf.head()

CPU times: user 1.7 s, sys: 5.28 s, total: 6.98 s
Wall time: 7.68 s


Unnamed: 0.1,Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,0,82,43,41,22.627878,84.111503,6.4742,208.148764,rice
1,1,105,26,47,26.989249,94.0838,5.918236,32.816282,muskmelon
2,2,70,60,25,18.751657,21.243104,5.808209,82.755315,maize
3,3,34,65,82,19.696618,14.295643,7.977798,62.23801,maize
4,4,94,36,46,27.210002,90.658402,6.042651,115.917395,watermelon


## CPU KNN Training

In [3]:
%%time
features = df[['N', 'P','K','temperature', 'humidity', 'ph', 'rainfall']]
target = df['label']

acc = []
model = []

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(features,target,test_size = 0.2,random_state=42)

from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.metrics import classification_report

knn = KNeighborsClassifier()

knn.fit(x_train,y_train)

predicted_values = knn.predict(x_test)

x = metrics.accuracy_score(y_test, predicted_values)
acc.append(x)
model.append('K Nearest Neighbours')
print("KNN Accuracy is: ", x)

KNN Accuracy is:  0.628183
CPU times: user 2min 28s, sys: 1.62 s, total: 2min 30s
Wall time: 2min 28s


## GPU KNN Training

In [4]:
%%time
from cuml.preprocessing import LabelEncoder

le = LabelEncoder()
features_cudf = df_cudf[['N', 'P','K','temperature', 'humidity', 'ph', 'rainfall']]
target_cudf = le.fit_transform(df_cudf['label'])

target

CPU times: user 523 ms, sys: 169 ms, total: 692 ms
Wall time: 682 ms


0                rice
1           muskmelon
2               maize
3               maize
4          watermelon
              ...    
4999995        cotton
4999996         apple
4999997        coffee
4999998        lentil
4999999      chickpea
Name: label, Length: 5000000, dtype: object

In [5]:
%%time
acc = []
model = []

from cuml.model_selection import train_test_split
x_train_cudf, x_test_cudf, y_train_cudf, y_test_cudf = train_test_split(features_cudf,target_cudf,test_size = 0.2,random_state=42)

from cuml.neighbors import KNeighborsClassifier
from cuml import metrics
# from cuml.metrics import classification_report

knn = KNeighborsClassifier()

knn.fit(x_train_cudf,y_train_cudf)

predicted_values = knn.predict(x_test_cudf)

x = metrics.accuracy_score(y_test_cudf, predicted_values)
acc.append(x)
model.append('K Nearest Neighbours')
print("KNN Accuracy is: ", x)

# print(classification_report(y_test,predicted_values))

KNN Accuracy is:  0.6277909874916077
CPU times: user 22.9 s, sys: 283 ms, total: 23.2 s
Wall time: 22.9 s


## CPU Decision Tree Training

In [6]:
%%time
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

DT = DecisionTreeClassifier(criterion="entropy",random_state=2,max_depth=5)

DT.fit(x_train,y_train)

predicted_values = DT.predict(x_test)
x = metrics.accuracy_score(y_test, predicted_values)
acc.append(x)
model.append('Decision Tree')
print("Decision Tree's Accuracy is: ", x*100)

#Print Train Accuracy
dt_train_accuracy = DT.score(x_train,y_train)
print("Training accuracy = ",DT.score(x_train,y_train))
#Print Test Accuracy
dt_test_accuracy = DT.score(x_test,y_test)
print("Testing accuracy = ",DT.score(x_test,y_test))

Decision Tree's Accuracy is:  52.1575
Training accuracy =  0.52228575
Testing accuracy =  0.521575
CPU times: user 1min 23s, sys: 1.14 s, total: 1min 25s
Wall time: 1min 24s


## GPU Decision Tree Training

In [7]:
# %%time
# from cuml.tree import DecisionTreeClassifier
# from cuml import metrics

# DT = DecisionTreeClassifier(criterion="entropy",random_state=2,max_depth=5)

# DT.fit(x_train_cudf,y_train_cudf)

# predicted_values = DT.predict(x_test_cudf)
# x = metrics.accuracy_score(y_test_cudf, predicted_values)
# acc.append(x)
# model.append('Decision Tree')
# print("Decision Tree's Accuracy is: ", x*100)

# #Print Train Accuracy
# dt_train_accuracy = DT.score(x_train_cudf,y_train_cudf)
# print("Training accuracy = ",DT.score(x_train_cudf,y_train_cudf))
# #Print Test Accuracy
# dt_test_accuracy = DT.score(x_test_cudf,y_test_cudf)
# print("Testing accuracy = ",DT.score(x_test_cudf,y_test_cudf))

## CPU Random Forest Training

In [8]:
%%time
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(x_train,y_train)

predicted_values = RF.predict(x_test)

x = metrics.accuracy_score(y_test, predicted_values)
acc.append(x)
model.append('RF')
print("Random Forest Accuracy is: ", x)

#Print Train Accuracy
rf_train_accuracy = RF.score(x_train,y_train)
print("Training accuracy = ",RF.score(x_train,y_train))
#Print Test Accuracy
rf_test_accuracy = RF.score(x_test,y_test)
print("Testing accuracy = ",RF.score(x_test,y_test))

Random Forest Accuracy is:  0.670755
Training accuracy =  0.99648375
Testing accuracy =  0.670755
CPU times: user 11min 38s, sys: 17.5 s, total: 11min 56s
Wall time: 11min 48s


## GPU Random Forest Training

In [9]:
%%time
from cuml.ensemble import RandomForestClassifier
from cuml import metrics

RF = RandomForestClassifier(n_estimators=20, random_state=0)
RF.fit(x_train_cudf,y_train_cudf)

predicted_values = RF.predict(x_test_cudf)

x = metrics.accuracy_score(y_test_cudf, predicted_values)
acc.append(x)
model.append('RF')
print("Random Forest Accuracy is: ", x)

#Print Train Accuracy
rf_train_accuracy = RF.score(x_train_cudf,y_train_cudf)
print("Training accuracy = ",RF.score(x_train_cudf,y_train_cudf))
#Print Test Accuracy
rf_test_accuracy = RF.score(x_test_cudf,y_test_cudf)
print("Testing accuracy = ",RF.score(x_test_cudf,y_test_cudf))

  return func(**kwargs)
  ret = func(*args, **kwargs)


Random Forest Accuracy is:  0.6548259854316711
Training accuracy =  0.6729572415351868
Testing accuracy =  0.6548259854316711
CPU times: user 57.8 s, sys: 5.98 s, total: 1min 3s
Wall time: 6.87 s


In [None]:
%%time
import pandas as pd

df = pd.read_csv("synthetic_crop.csv")

df.info()

FileNotFoundError: [Errno 2] No such file or directory: '/root/synthetic_crop.csv'