# Pipeline

A pipeline refers to the infrastructure encompassing a machine learning algorithm. It involves various stages such as data collection, organizing data into training files, training one or more models, and deploying these models into production. Below is the structured pipeline designed for image recognition in oil palm plantations. Refer to the 'oilpalm_package.py' file for details on all the functions involved.

In [None]:
import oilpalm_package as op

In [None]:
oilpalm_path = r"C:\Users\nene0\Downloads\widsdatathon2019\traininglabels.csv"

oilpalm = op.read_data(oilpalm_path)
oilpalm = op.sorted_images(oilpalm)
low_score, score_80, has_oilpalm, no_oilpam = op.dataframes_by_score(oilpalm)

In [None]:
op.save_train_test_labels(oilpalm)

In [None]:
new_oilpalm = r"C:\Users\nene0\OneDrive\바탕 화면\Python Learning\DataScienceMod2_LFZ\OilPalm_Kaggle\train_label.csv"
train_label = op.read_data(new_oilpalm)
low_score, score_80, has_oilpalm, no_oilpalm = op.dataframes_by_score(train_label)

In [None]:
op.num_values(low_score, score_80, has_oilpalm, no_oilpalm)

In [None]:
final_train_labels, final_target = op.final_dataframe_ds(has_oilpalm, score_80, no_oilpalm)
train = op.image_data_cleaning_ds(final_train_labels)

In [None]:
op.pickle_save_data(train, "train_data_ds")
op.pickle_save_data(final_target, "target_data_ds")

In [None]:
train_path = r"C:\Users\nene0\OneDrive\바탕 화면\Python Learning\DataScienceMod2_LFZ\OilPalm_Kaggle\train_data_ds.pickle"
target_path = r"C:\Users\nene0\OneDrive\바탕 화면\Python Learning\DataScienceMod2_LFZ\OilPalm_Kaggle\target_data_ds.pickle"
train = op.pickle_load_data(train_path)
target = op.pickle_load_data(target_path)

In [None]:
model = op.build_model()
history = op.train_model(train, target, model, num_batch_size=100, num_epochs=20)

In [None]:
test_path = r"C:\Users\nene0\OneDrive\바탕 화면\Python Learning\DataScienceMod2_LFZ\OilPalm_Kaggle\test_label.csv"

test = op.read_data(test_path)
test_true = test[['has_oilpalm']] # For real world test data there will be no true label, we are using test label for this project as test label was separated from original train.
test = op.image_data_cleaning_ds(test)

In [None]:
test_pred = op.prediction(test, model)

In [None]:
op.print_metric(test_true, test_pred)