<a href="https://colab.research.google.com/github/sbhattac/nsf-tip-ai-trust/blob/main/NSF_TIP_AI_Trust_POGIL_Activity_Fall_2023.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Activity sponsored by grant titled Targeted Infusion Project: Exposing Students to the Social Relevance and Trustworthiness of Artificial Intelligence, from the National Science Foundation (NSF), Award Number: 2205502

### Faculty Research Team for NSF grant funded project:
* Dr. Sambit Bhattacharya, PI, Professor of Computer Science
* Dr. Bogdan Czejdo, Co-PI, Belk Distinguished Professor of Science & Technology
* Dr. Khalid Lodhi, Co-PI, Professor & Director Forensic Science Program
* Dr. Zahra Shekarkhar, Co-PI, Associate Professor of Criminal Justice
* Dr. Xiaochen Hu, Co-PI, Associate Professor of Criminal Justice

### Data used in this activity is described in the following publication:
<i> Hu, Xiaochen, Xudong Zhang, and Nicholas Lovrich. "Public perceptions of police behavior during traffic stops: logistic regression and machine learning approaches compared." Journal of computational social science 4 (2021): 355-380. </i>
    





In [None]:
#@title # Assigning Roles and Responsibilities within each group

#@markdown ---
#@markdown ### Enter Instructor Name:
Instructor_Name = "" #@param {type:"string"}
#@markdown 1. Introduces activities. Assigns roles to participants.
#@markdown 2. Responds for help or clarification request.
#@markdown 3. Collects the Jupiter notebooks from Recorder and Evaluator
#@markdown ---

#@markdown ### Enter Facilitator Name:
Facilitator_Name = "" #@param {type:"string"}
#@markdown ### Enter Backup Facilitator Name:
Backup_Facilitator_Name = "" #@param {type:"string"}
#@markdown 1.	Reads aloud each question and ask for volunteers to answer. If there is no volunteer then he/she starts the discussion and asks one participant after another for comments, solutions, answers, or clarifications.  When majority participants agree then she/he asks Recorder to record the answer. Also coordinates discussion about the code execution and the output like any other question.
#@markdown 2.	Involves each participant equally in the discussions.
#@markdown 3.	Turn the coordinating role to Evaluator after finishing each activity.
#@markdown ---

#@markdown ### Enter Recorder Name:
Recorder_Name = "" #@param {type:"string"}
#@markdown ### Enter Backup Recorder Name:
Backup_Recorder_Name = "" #@param {type:"string"}
#@markdown 1.	Coordinates virtual screen access if this is an online activity. Displays his/her screen when asking questions. Gives access to screen sharing as requested
#@markdown 2.	Records all answers  for each question  inside the Jupiter Notebook
#@markdown 3. Use "Run all" in menu "Runtime" and then "Save" Jupiter Workbook with all answers and results.
#@markdown 4. Submit Jupiter notebook with all answers and results of the running code.
#@markdown ---

#@markdown ### Enter Evaluator Name:
Evaluator_Name = "" #@param {type:"string"}
#@markdown ### Enter Backup Evaluator Name:
Backup_Evaluator_Name = "" #@param {type:"string"}
#@markdown 1.	Keeps track of time for each designated Activity.
#@markdown 2.	After each activity leads the discussion about material and collects feedback in the form of the table below.
#@markdown 3. Submit Jupiter notebook with all comments and results of discussion at the end of each activity.
#@markdown ---

#@markdown ### Enter Participant names
Participant_4_Name = "" #@param {type:"string"}
Participant_5_Name = "" #@param {type:"string"}
Participant_6_Name = "" #@param {type:"string"}
Participant_7_Name = "" #@param {type:"string"}
Participant_8_Name = "" #@param {type:"string"}

#@markdown 1.	Participates actively in team work to answer all questions.
#@markdown 2.	Executes the code and shares the comments.
#@markdown ---

print("You have chosen {} as the Instructor".format(Instructor_Name))

In [None]:
#@title Download files to work on (press "run" button)
import urllib
import os
DOWNLOAD_ROOT = "https://raw.githubusercontent.com/sbhattac/nsf-tip-ai-trust/main/"
filename = "modified_police_public_contact_survey_traffic_stop_2005_2015.csv"
print("Downloading", filename)
url = DOWNLOAD_ROOT + filename
urllib.request.urlretrieve(url, filename)

In [2]:
#@title Library Imports (press "run" button)
import pandas as pd  #imports the "pandas" library
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier, export_graphviz, plot_tree
from sklearn.metrics import accuracy_score
import numpy as np
from IPython.display import SVG, display
from graphviz import Source
from ipywidgets import interactive, fixed
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn import tree
import matplotlib.pyplot as plt
import re
import matplotlib

## Background Information:

You have been given access to the file named <br><b><i> modified_police_public_contact_survey_traffic_stop_2005_2015.csv </i></b>
<br> which is used in the research publication referenced above.

The data is from a survey related to police public contact, specifically focusing on traffic stops from the years 2005 to 2015. Here's a breakdown of the columns in the dataset:
* Policebehaviorproperly: Indicates whether the police behaved properly during the interaction (binary: 0 for "no" or 1 for "yes").
* Gender: The gender of the person stopped (binary: 0 for "female" or 1 for "male").
* Age: The age of the person stopped. Numeric value minimum of 16 and maximum of 90.
* Arrested: Indicates whether the person was arrested (binary: 0 for "no" or 1 for "yes").
* Ticketed: Indicates whether the person was ticketed (binary: 0 for "no" or 1 for "yes").
* Warning: Indicates whether the person was given a warning (binary: 0 for "no" or 1 for "yes").
* Noformal: Indicates whether there was no formal action taken (binary: 0 for "no" or 1 for "yes").
* Reasonforstop: designates whether the officer(s) gave a reason for stopping the vehicle (binary: 0 for "no" or 1 for "yes").
*  Legitimatestop: Indicates whether the stop was considered legitimate by the person who was stopped (binary: 0 for "no" or 1 for "yes").
* Numberofofficers: The number of officers who were present at the stop (0 = one officer; 1 = two or more than two officers).
* Policeuseofforce: Indicates whether the police used force during the stop (binary: 0 for "no" or 1 for "yes").
* Race: White, Black, Asian, or Other for the race of the person stopped.

Each row in the dataset represents an individual traffic stop incident, with various attributes recorded. The data can be used to analyze trends and patterns in police behavior, demographic details of those stopped, and outcomes of traffic stops over this 10-year period.


In [None]:
#@title Decision Tree on Full Dataset with Depth Limit 3 (press "run" button)

# Re-importing the dataset to ensure a clean start
data = pd.read_csv("modified_police_public_contact_survey_traffic_stop_2005_2015.csv")

# Identifying categorical columns for encoding
categorical_cols = data.select_dtypes(include=['object']).columns

# Applying one-hot encoding to categorical columns
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
encoded_data = pd.DataFrame(encoder.fit_transform(data[categorical_cols]),
                            columns=encoder.get_feature_names_out(categorical_cols))

# Merging the encoded columns back with the original data
data_encoded = data.drop(categorical_cols, axis=1)
data_encoded = pd.concat([data_encoded, encoded_data], axis=1)

# Handling NaN values by filling with median
data_encoded = data_encoded.fillna(data_encoded.median())

# Preparing the data for the decision tree model
X = data_encoded.drop(['Policebehaviorproperly'], axis=1)  # Features
y = data_encoded['Policebehaviorproperly']  # Target variable

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating the Decision Tree model
dtree = DecisionTreeClassifier(max_depth=3)  # IF YOU WANT TO CHANGE THE TREE DEPTH REPLACE 3 WITH SOME OTHER VALUE LIKE 5
dtree.fit(X_train, y_train)

# Predicting the test set results and calculating the accuracy
y_pred = dtree.predict(X_test)
report = classification_report(y_test, y_pred)

fig, ax = plt.subplots(figsize=(20,15))

# Plotting the Decision Tree
#plt.figure(figsize=(20,10))
#plot_tree(dtree, filled=True, feature_names=X.columns, class_names=['Improper', 'Proper'])
plot_tree(dtree, ax=ax, filled=True, feature_names=X.columns, class_names=['Improper', 'Proper'])

def replace_text(obj):
    if type(obj) == matplotlib.text.Annotation:
        txt = obj.get_text()
        txt = re.sub("\ngini[^$]*","",txt)
        obj.set_text(txt)
    return obj

ax.properties()['children'] = [replace_text(i) for i in ax.properties()['children']]

plt.show()

#report


In [None]:
#@title Discussion Questions 1 - 12 (recorder will write and save answers)
#@markdown 1. The diagram is a decision tree (it is actually an upside-down tree). All boxes are referred to as nodes. How many nodes do we have in this diagram?
activity1_answer1 = "" #@param {type:"string"}
#@markdown 2. We refer to the single node on the top of the diagram as root. Decision trees are usually multi-level. We start counting from the top of diagram but do not include root. How many levels of decision tree do we have?
activity1_answer2 = "" #@param {type:"string"}
#@markdown 3. The nodes can be connected to other nodes on a lower level. How many such nodes are in this diagram?
activity1_answer3 = "" #@param {type:"string"}
#@markdown 4. The connections are called branches of the tree.  How many branches are in this diagram?
activity1_answer4 = "" #@param {type:"string"}
#@markdown 5. Nodes that grow two branches ("Yes" and "No") are called decision boxes. The left branch is the "Yes" decision and the right branch is the "No" decision. They represent a selection condition of a variable in the police contact data. What is the selection condition for the decision box in the root?
activity1_answer5 = "" #@param {type:"string"}
#@markdown 6. We refer to selection condition also as "predictor" since we use them for predictions.  How many predictors do we use in this diagram?
activity1_answer6 = "" #@param {type:"string"}
#@markdown 7. The nodes that do not grow any branches are called "leaves" and represent classes of the predicted variable Policebehaviorproperly. How many leaves do we have in the diagram?
activity1_answer7 = "" #@param {type:"string"}
#@markdown 8. Can you predict Policebehaviorproperly (whether it was Proper or Improper) when it was not a legitimate stop, the police did not use force, and police did not give a ticket?
activity1_answer8 = "" #@param {type:"string"}
#@markdown 9. Can you predict Policebehaviorproperly when it was not a legitimate stop, the police did happen to use force?
activity1_answer9 = "" #@param {type:"string"}
#@markdown 10. For the previous question does age seem to matter ?
activity1_answer10 = "" #@param {type:"string"}
#@markdown 11. Can you predict Policebehaviorproperly when it was a legitimate stop, the police did not use force? Does it matter whether a warning was given or not?
activity1_answer11 = "" #@param {type:"string"}
#@markdown 12. Do we always need to check the five predictors to determine the value of the Policebehaviorproperly?
activity1_answer12 = "" #@param {type:"string"}

In [None]:
#@title Take a look at the top rows of the datset / table (press "run" button)

# Load your dataset
data = pd.read_csv("modified_police_public_contact_survey_traffic_stop_2005_2015.csv")

N_Samples = 10

random_selection = data.sample(n=N_Samples)

# Display the first few rows of the dataset
random_selection.head(N_Samples)


In [None]:
#@title Discussion Questions 13 - 14 (recorder will write and save answers)
#@markdown 13. What you see above are 10 sample rows randomly selected (or different number of rows if you can change the code) of the dataset. Based on this view of an extremely small part of the data (because there are more than 15,000 rows in the full dataset) what variable would you choose for the root? You do not need to consider Age in your discussions.
activity1_answer13 = "" #@param {type:"string"}
#@markdown 14. What was the basis of choosing that variable for root in the previous question?
activity1_answer14 = "" #@param {type:"string"}