# Instructor Do: Decision Trees

Tried to add a plot of the tree diagram at the end of the file. Immediately broke it. It's not reading the csv. Looks like a Git LFS issue. 

It seems like the content you're getting is not the typical output of reading a CSV file using pandas. Instead, it looks like the output of Git LFS (Large File Storage). Git LFS is an extension for Git that deals with large files, and GitHub repositories often use it for managing large binary files efficiently.

If you're seeing this output when trying to read a CSV file, it's possible that the file is being stored in Git LFS, and the actual content is not being retrieved during the pd.read_csv operation.

Here are a few things to check and troubleshoot:

Ensure Git LFS is Installed:

Make sure that Git LFS is installed and configured on your system. You can download it from the Git LFS website.
Update LFS Configuration:

Check your repository's .gitattributes file to ensure that the CSV file is not configured as a Git LFS pointer. If it is, you might want to update the configuration or remove the pointer.
Clone Repository with LFS:

If you've just cloned the repository, ensure that you've cloned it with Git LFS support. You can do this by running:

`git lfs clone <repository-url>`

Replace <repository-url> with the URL of your Git repository.

Check Git LFS Status:

You can check the status of Git LFS in your repository by running:

`git lfs status`

This will show you which files are being managed by Git LFS.
Pull LFS Files:

Ensure that you've pulled the LFS files by running:

`git lfs pull`

This fetches the actual content of the LFS files.

In [1]:
# Initial imports
import pandas as pd
from pathlib import Path
from sklearn import tree
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

## Loading and Preprocessing Loans Encoded Data

In [2]:
# Loading data
file_path = Path("../Resources/loans_data_encoded.csv")
df_loans = pd.read_csv(file_path)
df_loans.head()



Unnamed: 0,version https://git-lfs.github.com/spec/v1
0,oid sha256:a78dc58c8c0063912dd4220666ed0e08422...
1,size 13628


In [3]:
# Define features set
X = df_loans.copy()
X.drop("bad", axis=1, inplace=True)
X.head()



KeyError: "['bad'] not found in axis"

In [None]:
# Define target vector
y = df_loans["bad"].values.reshape(-1, 1)
y[:5]



In [None]:
# Splitting into Train and Test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=78)



In [None]:
# Creating StandardScaler instance
scaler = StandardScaler()



In [None]:
# Fitting Standard Scaller
X_scaler = scaler.fit(X_train)



In [None]:
# Scaling data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)


## Fitting the Decision Tree Model

In [None]:
# Creating the decision tree classifier instance
model = tree.DecisionTreeClassifier()



In [None]:
# Fitting the model
model = model.fit(X_train_scaled, y_train)



## Making Predictions Using the Tree Model

In [None]:
# Making predictions using the testing data
predictions = model.predict(X_test_scaled)



## Model Evaluation

In [None]:
# Calculating the confusion matrix
cm = confusion_matrix(y_test, predictions)
cm_df = pd.DataFrame(
    cm, index=["Actual 0", "Actual 1"], columns=["Predicted 0", "Predicted 1"]
)

# Calculating the accuracy score
acc_score = accuracy_score(y_test, predictions)



In [None]:
# Displaying results
print("Confusion Matrix")
display(cm_df)
print(f"Accuracy Score : {acc_score}")
print("Classification Report")
print(classification_report(y_test, predictions))


In [None]:
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(200,100))
plot_tree(model, filled=True, feature_names=X.columns, class_names=['0', ' 1'], fontsize=10)
plt.show()