## Load In Dependencies

To run this demonstration notebook, you will need to have the following packages imported below installed. This may take some time.  

#### Note: Environment setup
Running this notebook requires a Microsoft Planatetary Computer API key.

To use your API key locally, set the environment variable <i><b>PC_SDK_SUBSCRIPTION_KEY</i></b> or use <i><b>pc.settings.set_subscription_key(<YOUR API Key>)</i></b><br>
See <a href="https://planetarycomputer.microsoft.com/docs/concepts/sas/#when-an-account-is-needed">when an account is needed for more </a>, and <a href="https://planetarycomputer.microsoft.com/account/request">request</a> an account if needed.

In [1]:
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Pandas for working with data in DataFrame structures
import pandas as pd

# Data Science
import pandas as pd

# Feature Engineering
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Machine Learning
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, accuracy_score,classification_report,confusion_matrix
from sklearn.ensemble import RandomForestClassifier 
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

In [2]:
crop_data = pd.read_csv("crop_data.csv")
crop_data.head()

Unnamed: 0,Latitude and Longitude,Class of Land,vh,vv,RVI,RVI(min),RVI(max),RVI(median),vh(min),vh(max),vh(median),vv(min),vv(max),vv(median)
0,"(10.323727047081501, 105.2516346045924)",Rice,0.043734,0.205613,0.640923,0.587695,0.694151,0.640923,0.003147,3.314999,0.038026,0.015913,162.339966,0.148608
1,"(10.322364360592521, 105.27843410554115)",Rice,0.040909,0.168547,0.701644,0.665469,0.737819,0.701644,0.002306,1.885325,0.03383,0.007239,5.564671,0.143516
2,"(10.321455902933202, 105.25254306225168)",Rice,0.041375,0.203101,0.619992,0.577957,0.662027,0.619992,0.002989,3.314999,0.035738,0.015913,162.339966,0.145177
3,"(10.324181275911162, 105.25118037576274)",Rice,0.044391,0.206562,0.645607,0.593127,0.698088,0.645607,0.003147,3.314999,0.038637,0.01631,162.339966,0.149594
4,"(10.324635504740822, 105.27389181724476)",Rice,0.042551,0.174365,0.704493,0.655041,0.753944,0.704493,0.002744,0.46744,0.036355,0.01219,14.588789,0.14711


## Model Building


<p align="justify"> Now let us select the columns required for our model building exercise. We will consider only VV and VH for our model. It does not make sense to use latitude and longitude as predictor variables as they do not have any impact on presence of rice crop.</p>

In [3]:
crop_data = crop_data[['vh', 'vv', 'RVI', 'RVI(min)', 'RVI(max)', 'RVI(median)', 'vh(min)', 'vh(max)', 'vh(median)', 'vv(min)', 'vv(max)', 'vv(median)', 'Class of Land']]


### Train and Test Split 

<p align="justify">We will now split the data into 70% training data and 30% test data. Scikit-learn alias “sklearn” is a robust library for machine learning in Python. The scikit-learn library has a <i><b>model_selection</b></i> module in which there is a splitting function <i><b>train_test_split</b></i>. You can use the same.</p>

In [4]:
X = crop_data.drop(columns=['Class of Land']).values
y = crop_data ['Class of Land'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10,stratify=y,random_state=40)

### Feature Scaling 

<p align="justify"> Before initiating the model training we may have to execute different data pre-processing steps. Here we are demonstrating the scaling of VV and VH variable by using Standard Scaler.</p>

<p align = "justify">Feature Scaling is a data preprocessing step for numerical features. Many machine learning algorithms like Gradient descent methods, KNN algorithm, linear and logistic regression, etc. require data scaling to produce good results. Scikit learn provides functions that can be used to apply data scaling. Here we are using Standard Scaler.</p>

<h4 style="color:rgb(195, 52, 235)"><strong>Tip 4 </strong></h4>
<p align="justify">Participants might explore other feature scaling techniques like Min Max Scaler, Max Absolute Scaling, Robust Scaling etc.</p>

In [5]:
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [6]:
'''from sklearn.model_selection import GridSearchCV

lr_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000], 'penalty': ['l1', 'l2']}
lr_grid_search = GridSearchCV(LogisticRegression(solver='lbfgs'), lr_param_grid, cv=5)
lr_grid_search.fit(X_train, y_train)
lr_model = lr_grid_search.best_estimator_'''


"from sklearn.model_selection import GridSearchCV\n\nlr_param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000], 'penalty': ['l1', 'l2']}\nlr_grid_search = GridSearchCV(LogisticRegression(solver='lbfgs'), lr_param_grid, cv=5)\nlr_grid_search.fit(X_train, y_train)\nlr_model = lr_grid_search.best_estimator_"

In [7]:
'''rf_param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4]}
rf_grid_search = GridSearchCV(RandomForestClassifier(random_state=42), rf_param_grid, cv=5)
rf_grid_search.fit(X_train, y_train)
rf_model = rf_grid_search.best_estimator_'''


"rf_param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20, 30], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4]}\nrf_grid_search = GridSearchCV(RandomForestClassifier(random_state=42), rf_param_grid, cv=5)\nrf_grid_search.fit(X_train, y_train)\nrf_model = rf_grid_search.best_estimator_"

In [8]:
'''# Create and train neural nets model
nn_model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
nn_model.fit(X_train, y_train)

# Create and train Naive Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)'''

'# Create and train neural nets model\nnn_model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)\nnn_model.fit(X_train, y_train)\n\n# Create and train Naive Bayes model\nnb_model = GaussianNB()\nnb_model.fit(X_train, y_train)'

In [9]:
'''gb_param_grid = {'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2], 'max_depth': [3, 5, 7]}
gb_grid_search = GridSearchCV(GradientBoostingClassifier(), gb_param_grid, cv=5)
gb_grid_search.fit(X_train, y_train)
gb_model = gb_grid_search.best_estimator_'''

"gb_param_grid = {'n_estimators': [50, 100, 200], 'learning_rate': [0.01, 0.1, 0.2], 'max_depth': [3, 5, 7]}\ngb_grid_search = GridSearchCV(GradientBoostingClassifier(), gb_param_grid, cv=5)\ngb_grid_search.fit(X_train, y_train)\ngb_model = gb_grid_search.best_estimator_"

In [10]:
'''# Create and train Decision Trees model
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)'''

'# Create and train Decision Trees model\ndt_model = DecisionTreeClassifier()\ndt_model.fit(X_train, y_train)'

In [11]:
'''knn_param_grid = {'n_neighbors': [3, 5, 7, 10], 'weights': ['uniform', 'distance'], 'p': [1, 2]}
knn_grid_search = GridSearchCV(KNeighborsClassifier(), knn_param_grid, cv=5)
knn_grid_search.fit(X_train, y_train)
knn_model = knn_grid_search.best_estimator_'''

"knn_param_grid = {'n_neighbors': [3, 5, 7, 10], 'weights': ['uniform', 'distance'], 'p': [1, 2]}\nknn_grid_search = GridSearchCV(KNeighborsClassifier(), knn_param_grid, cv=5)\nknn_grid_search.fit(X_train, y_train)\nknn_model = knn_grid_search.best_estimator_"

### Model Training

<p justify ="align">Now that we have the data in a format appropriate for machine learning, we can begin training a model. In this demonstration notebook, we have used a binary logistic regression model from the scikit-learn library. This library offers a wide range of other models, each with the capacity for extensive parameter tuning and customization capabilities.</p>

<p justify ="align">Scikit-learn models require separation of predictor variables and the response variable. You have to store the predictor variables in array X and the response variable in the array Y. You must make sure not to include the response variable in array X. It also doesn't make sense to use latitude and longitude as predictor variables in such a confined area, so we drop those too.</p>

In [12]:
# Create and train logistic regression
lr_model = LogisticRegression(solver='lbfgs')
lr_model.fit(X_train,y_train)

# Create and train random forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Create and train neural nets model
nn_model = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, random_state=42)
nn_model.fit(X_train, y_train)

# Create and train Naive Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Create and train Gradient Boosting model
gb_model = GradientBoostingClassifier()
gb_model.fit(X_train, y_train)

# Create and train Decision Trees model
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train, y_train)

# Create and train K-Nearest Neighbors model
knn_model = KNeighborsClassifier()
knn_model.fit(X_train, y_train)

In [13]:
'''# Importing necessary libraries
from sklearn.model_selection import cross_val_score

# Define the models
models = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]
model_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']

# Perform cross-validation for each model
for model, model_name in zip(models, model_names):
    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
    print(f'{model_name} - Cross-Validation Accuracy: {scores.mean():.2f} (+/- {scores.std() * 2:.2f})')'''


"# Importing necessary libraries\nfrom sklearn.model_selection import cross_val_score\n\n# Define the models\nmodels = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]\nmodel_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']\n\n# Perform cross-validation for each model\nfor model, model_name in zip(models, model_names):\n    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')\n    print(f'{model_name} - Cross-Validation Accuracy: {scores.mean():.2f} (+/- {scores.std() * 2:.2f})')"

## Model Evaluation

Now that we have trained our model , all that is left is to evaluate it. For evaluation we will generate the classification report and will plot the confusion matrix. Scikit-learn provides many other metrics that can be used for evaluation. You can even write a code on your own.

### In-Sample Evaluation
<p align="Jutisfy"> We will be generating a classification report and a confusion matrix for the training data. It must be stressed that this is in-sample performance testing , which is the performance testing on the training dataset. These metrics are NOT truly indicative of the model's performance. You should wait to test the model performance on the test data before you feel confident about your model.</p>

In this section, we make predictions on the training set and store them in the <b><i>insample_ predictions</i></b> variable. A confusion matrix is generated to gauge the robustness of the model. 

In [14]:
insample_predictions_LR = lr_model.predict(X_train)
insample_predictions_RF = rf_model.predict(X_train)
insample_predictions_NN = nn_model.predict(X_train)
insample_predictions_NB = nb_model.predict(X_train)
insample_predictions_GB = gb_model.predict(X_train)
insample_predictions_DT = dt_model.predict(X_train)
insample_predictions_KNN = knn_model.predict(X_train)

In [15]:
print('******Logistic Regression******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_LR,y_train)))
print(classification_report(insample_predictions_LR,y_train))
print('\n')
print('******Random Forest******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_RF,y_train)))
print(classification_report(insample_predictions_RF,y_train))
print('\n')
print('******Neural Nets******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_NN,y_train)))
print(classification_report(insample_predictions_NN,y_train))
print('\n')
print('******Naive Bayes******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_NB,y_train)))
print(classification_report(insample_predictions_NB,y_train))
print('\n')
print('******Gradient Boosting******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_GB,y_train)))
print(classification_report(insample_predictions_GB,y_train))
print('\n')
print('******Decision Tree******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_DT,y_train)))
print(classification_report(insample_predictions_DT,y_train))
print('\n')
print('******KNeighbors******')
print("Insample Accuracy {0:.2f}%".format(100*accuracy_score(insample_predictions_KNN,y_train)))
print(classification_report(insample_predictions_KNN,y_train))

******Logistic Regression******
Insample Accuracy 95.56%
              precision    recall  f1-score   support

    Non Rice       0.96      0.95      0.96       274
        Rice       0.95      0.96      0.96       266

    accuracy                           0.96       540
   macro avg       0.96      0.96      0.96       540
weighted avg       0.96      0.96      0.96       540



******Random Forest******
Insample Accuracy 100.00%
              precision    recall  f1-score   support

    Non Rice       1.00      1.00      1.00       270
        Rice       1.00      1.00      1.00       270

    accuracy                           1.00       540
   macro avg       1.00      1.00      1.00       540
weighted avg       1.00      1.00      1.00       540



******Neural Nets******
Insample Accuracy 100.00%
              precision    recall  f1-score   support

    Non Rice       1.00      1.00      1.00       270
        Rice       1.00      1.00      1.00       270

    accuracy       

<p> For plotting a confusion matrix we define the function <b><i>plot_confusion_matrix</i></b>.

In [16]:
def plot_confusion_matrices(models, model_names, X, y, labels):
    """
    Plots confusion matrices for a list of models.

    Parameters:
    - models: A list of trained models
    - model_names: A list of names corresponding to each model
    - X: Input features
    - y: True labels
    - labels: Class labels for the confusion matrices
    """
    num_models = len(models)
    num_rows = (num_models + 1) // 2
    num_cols = 2

    plt.figure(figsize=(15, 5 * num_rows))

    for i, (model, model_name) in enumerate(zip(models, model_names), 1):
        predictions = model.predict(X)
        cm = confusion_matrix(y, predictions)

        plt.subplot(num_rows, num_cols, i)
        sns.heatmap(cm, annot=True, fmt='g', cmap='Blues', xticklabels=labels, yticklabels=labels)
        plt.title(f"Confusion Matrix - {model_name}")
        plt.xlabel('Predicted labels')
        plt.ylabel('True labels')

    plt.tight_layout()
    plt.show()



In [17]:
'''models = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]
model_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']
insample_predictions = [insample_predictions_LR, insample_predictions_RF, insample_predictions_NN,
                         insample_predictions_NB, insample_predictions_GB, insample_predictions_DT, insample_predictions_KNN]

plot_confusion_matrices(models, model_names, X_train, y_train, labels=['Rice', 'Non Rice'])'''


"models = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]\nmodel_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']\ninsample_predictions = [insample_predictions_LR, insample_predictions_RF, insample_predictions_NN,\n                         insample_predictions_NB, insample_predictions_GB, insample_predictions_DT, insample_predictions_KNN]\n\nplot_confusion_matrices(models, model_names, X_train, y_train, labels=['Rice', 'Non Rice'])"

### Out-Sample Evaluation

When evaluating a machine learning model, it is essential to correctly and fairly evaluate the model's ability to generalize. This is because models have a tendency to overfit the dataset they are trained on. To estimate the out-of-sample performance, we will predict on the test data now. 

In [18]:
outsample_predictions_LR = lr_model.predict(X_test)
outsample_predictions_RF = rf_model.predict(X_test)
outsample_predictions_NN = nn_model.predict(X_test)
outsample_predictions_NB = nb_model.predict(X_test)
outsample_predictions_GB = gb_model.predict(X_test)
outsample_predictions_DT = dt_model.predict(X_test)
outsample_predictions_KNN = knn_model.predict(X_test)

In [19]:
print("*******Logistic Regression******")
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_LR, y_test)))
print(classification_report(y_test, outsample_predictions_LR))
print('\n')
print("*******Random Forest******")
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_RF, y_test)))
print(classification_report(y_test, outsample_predictions_RF))
print('\n')
print("*******Neural Nets******")
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_NN, y_test)))
print(classification_report(y_test, outsample_predictions_NN))
print('\n')
print('******Naive Bayes******')
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_NB, y_test)))
print(classification_report(y_test, outsample_predictions_NB))
print('\n')
print('******Gradient Boosting******')
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_GB, y_test)))
print(classification_report(y_test, outsample_predictions_GB))
print('\n')
print('******Decision Tree******')
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_DT, y_test)))
print(classification_report(y_test, outsample_predictions_DT))
print('\n')
print('******KNeighbors******')
print("Accuracy {0:.2f}%".format(100*accuracy_score(outsample_predictions_KNN, y_test)))
print(classification_report(y_test, outsample_predictions_KNN))

*******Logistic Regression******
Accuracy 96.67%
              precision    recall  f1-score   support

    Non Rice       0.97      0.97      0.97        30
        Rice       0.97      0.97      0.97        30

    accuracy                           0.97        60
   macro avg       0.97      0.97      0.97        60
weighted avg       0.97      0.97      0.97        60



*******Random Forest******
Accuracy 100.00%
              precision    recall  f1-score   support

    Non Rice       1.00      1.00      1.00        30
        Rice       1.00      1.00      1.00        30

    accuracy                           1.00        60
   macro avg       1.00      1.00      1.00        60
weighted avg       1.00      1.00      1.00        60



*******Neural Nets******
Accuracy 100.00%
              precision    recall  f1-score   support

    Non Rice       1.00      1.00      1.00        30
        Rice       1.00      1.00      1.00        30

    accuracy                           1.00

In [20]:
'''models = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]
model_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']
insample_predictions = [outsample_predictions_LR, outsample_predictions_RF, outsample_predictions_NN,
                         outsample_predictions_NB, outsample_predictions_GB, outsample_predictions_DT, outsample_predictions_KNN]

plot_confusion_matrices(models, model_names, X_test, y_test, labels=['Rice', 'Non Rice'])'''

"models = [lr_model, rf_model, nn_model, nb_model, gb_model, dt_model, knn_model]\nmodel_names = ['Logistic Regression', 'Random Forest', 'Neural Nets', 'Naive Bayes', 'Gradient Boosting', 'Decision Trees', 'K-Nearest Neighbors']\ninsample_predictions = [outsample_predictions_LR, outsample_predictions_RF, outsample_predictions_NN,\n                         outsample_predictions_NB, outsample_predictions_GB, outsample_predictions_DT, outsample_predictions_KNN]\n\nplot_confusion_matrices(models, model_names, X_test, y_test, labels=['Rice', 'Non Rice'])"

In [21]:
submission_vh_vv_rvi_data = pd.read_csv("export_data.csv")
submission_vh_vv_rvi_data.head()

Unnamed: 0,Latitude and Longitude,Class of Land,vh,vv,RVI,RVI(min),RVI(max),RVI(median),vh(min),vh(max),vh(median),vv(min),vv(max),vv(median)
0,"(10.18019073690894, 105.32022315786804)",,0.022501,0.149149,0.488522,0.459407,0.517636,0.488522,0.001122,0.536283,0.017607,0.003845,1.086427,0.132135
1,"(10.561107033461816, 105.12772097986661)",,0.031486,0.151358,0.628084,0.614971,0.641197,0.628084,0.0009,0.401623,0.024936,0.003047,1.123161,0.134585
2,"(10.623790611954897, 105.13771401411867)",,0.039603,0.136223,0.791227,0.770222,0.812231,0.791227,0.003468,0.313,0.035717,0.006694,3.830926,0.117729
3,"(10.583364246115156, 105.23946127195805)",,0.042935,0.147232,0.794516,0.789525,0.799506,0.794516,0.000641,4.416507,0.01404,0.001529,9.210921,0.053367
4,"(10.20744446668854, 105.26844107128906)",,0.027665,0.156761,0.554118,0.526656,0.581579,0.554118,0.001192,3.002685,0.023604,0.004182,7.247988,0.137934


In [22]:
#submission_vh_vv_rvi_data = submission_vh_vv_rvi_data[['vh','vv','RVI']]

In [23]:
sc.fit(submission_vh_vv_rvi_data[['vh', 'vv', 'RVI', 'RVI(min)', 'RVI(max)', 'RVI(median)', 'vh(min)', 'vh(max)', 'vh(median)', 'vv(min)', 'vv(max)', 'vv(median)']])

In [24]:
# Feature Scaling 
#submission_vh_vv_rvi_data = submission_vh_vv_rvi_data.values
transformed_submission_data = sc.transform(submission_vh_vv_rvi_data[['vh', 'vv', 'RVI', 'RVI(min)', 'RVI(max)', 'RVI(median)', 'vh(min)', 'vh(max)', 'vh(median)', 'vv(min)', 'vv(max)', 'vv(median)']])

In [25]:
#Making predictions
final_predictions = rf_model.predict(transformed_submission_data)
final_prediction_series = pd.Series(final_predictions)

In [26]:
#Combining the results into dataframe
submission_df = pd.DataFrame({'id':submission_vh_vv_rvi_data['Latitude and Longitude'].values, 'target':final_prediction_series.values})

In [27]:
#Displaying the sample submission dataframe
display(submission_df)

Unnamed: 0,id,target
0,"(10.18019073690894, 105.32022315786804)",Rice
1,"(10.561107033461816, 105.12772097986661)",Rice
2,"(10.623790611954897, 105.13771401411867)",Rice
3,"(10.583364246115156, 105.23946127195805)",Non Rice
4,"(10.20744446668854, 105.26844107128906)",Rice
...,...,...
245,"(10.308283266873062, 105.50872812216863)",Non Rice
246,"(10.582910017285496, 105.23991550078767)",Non Rice
247,"(10.581547330796518, 105.23991550078767)",Rice
248,"(10.629241357910818, 105.15315779432643)",Rice


In [28]:
#Dumping the predictions into a csv file.
submission_df.to_csv("challenge_1_submission_rice_crop_prediction.csv",index = False)

In [29]:
'''import geopandas as gpd
from shapely.geometry import Point
import matplotlib.pyplot as plt

# Read your CSV file
df = pd.read_csv("crop_data.csv")

# Assuming your DataFrame is named df and the column containing coordinates is 'Latitude and Longitude'
# If the coordinates are in a tuple format within strings, you can convert them to Point objects
df['geometry'] = df['Latitude and Longitude'].apply(lambda x: Point(float(x.replace('(', '').replace(')', '').replace(' ', '').split(',')[1]), float(x.replace('(', '').replace(')', '').replace(' ', '').split(',')[0])))

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(df, geometry='geometry')

# Read built-in world dataset
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Extract Vietnam from the world dataset
vietnam = world[world['name'] == 'Vietnam']

# Plotting
fig, ax = plt.subplots(figsize=(10, 8))
vietnam.plot(ax=ax, color='lightgreen')  # Plot Vietnam boundaries
gdf.plot(ax=ax, color='red', marker='o', markersize=50)
plt.title('Latitude and Longitude Plot on Vietnam Map')
plt.show()'''


'import geopandas as gpd\nfrom shapely.geometry import Point\nimport matplotlib.pyplot as plt\n\n# Read your CSV file\ndf = pd.read_csv("crop_data.csv")\n\n# Assuming your DataFrame is named df and the column containing coordinates is \'Latitude and Longitude\'\n# If the coordinates are in a tuple format within strings, you can convert them to Point objects\ndf[\'geometry\'] = df[\'Latitude and Longitude\'].apply(lambda x: Point(float(x.replace(\'(\', \'\').replace(\')\', \'\').replace(\' \', \'\').split(\',\')[1]), float(x.replace(\'(\', \'\').replace(\')\', \'\').replace(\' \', \'\').split(\',\')[0])))\n\n# Create a GeoDataFrame\ngdf = gpd.GeoDataFrame(df, geometry=\'geometry\')\n\n# Read built-in world dataset\nworld = gpd.read_file(gpd.datasets.get_path(\'naturalearth_lowres\'))\n\n# Extract Vietnam from the world dataset\nvietnam = world[world[\'name\'] == \'Vietnam\']\n\n# Plotting\nfig, ax = plt.subplots(figsize=(10, 8))\nvietnam.plot(ax=ax, color=\'lightgreen\')  # Plot Vietnam