# The Data

Reserachers at the University of California, Berkeley's Human Rights Center organize and cleaned public immigration data from the Executive Office for Immigration Review (EOIR) and judge data from the Transcational Records Access Clearinghouse (TRAC). 

They then shared their cleaned data with us. 

The data was pulled for January 2022.  


## Assumptions and limitations

Proceedings that did not have a final decision, which made up about 41% of the initial data pull, were stripped from the data. The reserachers say "this could be due to an oversight in recording, the data being corrupted or other reasons." The data does not include the asylum seekers explanation for leaving their country. Later in the notebook, we checked with another spreadsheet (one that did not have dropped cases) to see how many credible fear review cases and ones involving Cameroonians were dropped. The numbers were small and would have a negligible affect on the analysis (only one dropped case with Cameroonians and the largest percent of dropped CFR cases was in 2021 with 3.7 percent). 


## What we learned

We affirmed the data in this line: A previous analysis by the students of the Human Rights Center Investigations Lab showed that immigration judges upheld more than 72% percent of all credible fear determinations they reviewed between 2018 and 2021.
The line this pertains to in the story: "...showing that immigration judges nationally upheld 72.23% of all credible fear determinations they reviewed between 2018 and 2021." 
The analysis for this can be found under the question: "What is the national rate for credible fear reviews that side with ICE?"

We also found that Judge Landis, the judge who ruled that BJ's fears were not credible upheld 99.3 percent of the negative determinations that came before him during that time — keeping the Xs in place for all but 2 out of 314 asylum seekers. Between 2019 and 2021 Landis upheld all negative credible fear determinations that he reviewed.
The analysis for this can be found under the question: "What was judge Landis rate for siding with ICE on credible fear reviews after 2018 (only looking at 2019, 2020 and 2021)?"

The analysis for this line: "Between 2018 and 2021, Louisiana judges upheld 93.7% of all negative credible fear determination that came before them in reviews."
...can be found in the section "What is the Louisiana rate for credible fear review that side with ICE?"


## Cleaning the data

The NATIONALITY column, which denotes the origin country for the asylum seeker, had an error for Cameroonians where an extra character was added. One of the researchers removed the character for us. 

## Importing Data 

'data_final.csv' is the cleaned data provided for us from the researchers. We used the Python library pandas to analyze the data. Due to the size of the file, we are only pulling in specific columns to make the processing easier. Here are the columns and what each denote, based on a codebook provided by the researchers:

- NATIONALTY: Nationality of the asylum seeker.
- DECISION_DATE-YEAR: The year on which the immigration judge rendered a decision on the proceeding
- CHARGING_DOC_DATE_YEAR: The year the Department of Homeland Security issued the charging document to the asylum seeker
- BASE_CITY: The code that represents the immigration court having jurisdiction over the assigned hearing location
- DECISION: Whether the asylum seeker was granted asylum
- JUDGE_CODE: The code that represents the immigration judge assigned to the case
- CASE_TYPE_CFR: Credible Fear Review

In [1]:
import pandas as pd
cols = ['NATIONALITY', 'DECISION_DATE_YEAR', 'CHARGING_DOC_DATE_YEAR', 'BASE_CITY', 'DECISION', 'JUDGE_CODE', 'CASE_TYPE_CFR']
df = pd.read_csv('data_final.csv', usecols=cols)


# creating a function to filter the data 
def filter_data(df, la_only, cfr_only, cm_only):
    #if true, only look at Louisiana courts
    if la_only == True:
        LOU = ['NOL', 'OAK', 'JNA']
        # filtering so we're only taking cases that happening in Louisiana
        df = df[df["BASE_CITY"].isin(LOU)]
    #If true, only look at credible fear reviews
    if cfr_only == True:
        df = df[df["CASE_TYPE_CFR"]==1]
    if cm_only == True:
        df = df[df["NATIONALITY"]=="CM"]
    # we only are looking at 2018, 2019, 2020 and 2021 for our analysis 
    df = df[df["DECISION_DATE_YEAR"].isin([2018,2019,2020,2021])]
    return df
df.head(5)


Unnamed: 0,BASE_CITY,JUDGE_CODE,NATIONALITY,CASE_TYPE_CFR,CHARGING_DOC_DATE_YEAR,DECISION_DATE_YEAR,DECISION
0,CHI,ESS,HO,0,2016,2017,0
1,PSD,,HO,0,2017,2017,1
2,SND,LOC,VE,0,2020,2020,0
3,SNA,YG1,EC,0,2020,2020,0
4,SND,SS2,GT,0,2020,2020,0


# Questions we asked the data (all for 2018-2021)

## Credible fear review rates

### What is the national rate for credible fear reviews that side with ICE?

To find the answer, we filtered by credible fear review only. We also wrote a function to generate the rate. A one indicates the judge sided with the asylum seeker, while a zero indicates that the judge sided with ICE. To get the rate, we took the mean of the decision file, and subtracted that from 1. We then multiplied it by 100. 

In [25]:
def get_rate(df):
    rate = (1-df["DECISION"].mean())*100
    return rate

tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=False)
rate = get_rate(tdf)
print("The national rate for credible fear review that side with ICE is {}%.".format(rate.round(2)))

The national rate for credible fear review that side with ICE is 72.23%.


### What is the Louisiana rate for credible fear review that side with ICE?

In [3]:
tdf = filter_data(df, la_only=True, cfr_only=True, cm_only=False)
rate = get_rate(tdf)
print("The Louisiana rate for credible fear review that side with ICE is {}%.".format(rate.round(2)))

The Louisiana rate for credible fear review that side with ICE is 93.71%.


### What is the national rate for credible fear review siding with ICE, filtered by Cameroonians?

In [23]:
tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=True)
rate = get_rate(tdf)
print("The Cameroonian national rate for credible fear review that side with ICE is {}%.".format(rate.round(2)))

The Cameroonian national rate for credible fear review that side with ICE is 32.41%.


### What is the Louisiana rate for credible fear review siding with ICE, filtered by Cameroonians?

In [24]:
tdf = filter_data(df, la_only=True, cfr_only=True, cm_only=True)
rate = get_rate(tdf)
print("The Cameroonian Louisiana rate for credible fear review that side with ICE is {}%.".format(rate.round(2)))

The Cameroonian Louisiana rate for credible fear review that side with ICE is 89.58%.


# What were the raw numbers for Cameroonian Louisiana cases?

In [44]:
tdf = filter_data(df, la_only=True, cfr_only=True, cm_only=True)
tdf["DECISION"].value_counts()

0    43
1     5
Name: DECISION, dtype: int64

## Number of cases

### How many  credible fear review cases were there nationally between 2018 and 2021?
For this, I filtered by credible fear reviews and did a len function to count the number of rows.

In [6]:
cases = len(filter_data(df, la_only=False, cfr_only=True, cm_only=False))
print("There are {} credible fear review cases nationally.".format(cases))

There are 46475 credible fear review cases nationally.


### How many credible fear review cases were there in Louisiana between 2018 and 2021?


In [7]:
cases = len(filter_data(df, la_only=True, cfr_only=True, cm_only=False))
print("There are {} credible fear review cases in Louisiana.".format(cases))

There are 4152 credible fear review cases in Louisiana.


### How many credible fear review cases were there nationally for Cameroonians?


In [8]:
cases = len(filter_data(df, la_only=False, cfr_only=True, cm_only=True))
print("There are {} credible fear review cases nationally involving Cameroonians.".format(cases))

There are 324 credible fear review cases nationally involving Cameroonians.


### How many credible fear review cases were there in Louisiana for Cameroonians?


In [9]:
cases = len(filter_data(df, la_only=True, cfr_only=True, cm_only=True))
print("There are {} credible fear review cases in Louisiana involving Cameroonians.".format(cases))

There are 48 credible fear review cases in Louisiana involving Cameroonians.


## Looking at judges

### How many credible fear review cases did Jude Landis have?

To do this, I used the previously outlined method for getting the number of cases (checking the number of rows) after filtering the JUDGE_CODE column by "BHL", which is the Brent Landis' code.

In [10]:
tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=False)
cases = len(tdf[tdf["JUDGE_CODE"]=="BHL"])
print("Jude Landis made decisions on {} credible fear review cases.".format(cases))

Jude Landis made decisions on 314 credible fear review cases.


### What was Judge Landis rate for siding with ICE?

In [11]:
tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=False)
tdf = tdf[tdf["JUDGE_CODE"]=="BHL"]
rate = get_rate(tdf)
print("Judge Landis' rate for siding with ICE on credible fear reviews was {}%.".format(rate.round(2)))

Judge Landis' rate for siding with ICE on credible fear reviews was 99.36%.


### How many credible fear review cases involving Cameroonians did Jude Landis have?


In [12]:
tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=True)
cases = len(tdf[tdf["JUDGE_CODE"]=="BHL"])
print("Jude Landis made decisions on {} credible fear review cases involving Cameroonians.".format(cases))

Jude Landis made decisions on 11 credible fear review cases involving Cameroonians.


### What was judge Landis rate for siding with ICE on credible fear reviews after 2018 (only looking at 2019, 2020 and 2021)


In [13]:
tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=False)
tdf = tdf[tdf["DECISION_DATE_YEAR"].isin([2019,2020,2021])]
tdf = tdf[tdf["JUDGE_CODE"]=="BHL"]
rate = get_rate(tdf)
print("Jude Landis rate for siding with ICE on credible fear reviews in 2019, 2020 and 2021 was {}%".format(rate))

Jude Landis rate for siding with ICE on credible fear reviews in 2019, 2020 and 2021 was 100.0%


# Checking the data for dropped Cameroonians cases

When cleaning the data, Berkley researchers removed all rows that did not have a final decision. To see how many cases were dropped involving Cameroonians, we compared merged.csv (provided for us by the researchers, who said this spreadsheet did not filter out cases with missing decisions) with the data_final.csv, which we did our analysis on. 

When comparing the years for 2018, 2019, 2020 and 2021 only one case was missing – not enough to alter our analysis. 

### Importing the data

In [33]:
cols = ["A_NAT","A_CASE_TYPE","B_COMP_DATE"]
merged = pd.read_csv('merged.csv', usecols=cols)




Unnamed: 0,B_COMP_DATE,A_NAT,A_CASE_TYPE
3365,2021-01-11,CM,CFR
10724,2020-05-05,CM,CFR
67815,2016-02-04,CM,CFR
194825,2016-03-07,CM,CFR
528895,2016-04-05,CM,CFR


## First, a quick fact check: 41 percent of cases were dropped between data sets
To confirm this, took the length of both data frames and got the percent difference.

In [45]:
(len(df)-len(merged))/len(merged)*100

-41.02452449034137

## Writting the funtion to compare df_final.csv and merged.csv

The function filters the merged and df_final databases so they're only credible fear reviews and Camerooneans. It also filters by the given year. The function then compares the total number of rows between each other and prints a report. Any difference between the two is noted, otherwise the report says that there is no difference.  

Note that there was only one misssing credible fear review cases involving Cameroonians. 

In [46]:
def compare_spreadsheets_cm(year):
    # filtering data_final so we're only looking at credible fear review and Cameroonians. 
    tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=True)
    # filtering merged data so we're only looking at credible fear review and Cameroonians. 
    cm = merged[merged["A_NAT"]=="CM"]
    cfrcm = cm[cm["A_CASE_TYPE"]=="CFR"]
    # filter both by the year we care about
    universe = len(cfrcm[cfrcm["B_COMP_DATE"].str.contains(str(year))])
    cleaned = len(tdf[tdf["DECISION_DATE_YEAR"]==year])
    difference = universe-cleaned
    print("Looking at {}:".format(year))
    print("There are {} cases in merged.csv.".format(universe))
    print("There are {} cases in the df_final.csv.".format(cleaned))
    if difference == 0:
        print("Therefore, there were no dropped CM cases in {}.".format(year))
    else:
        print("That's a difference of {} case.".format(difference))


In [47]:
compare_spreadsheets_cm(2018)

Looking at 2018:
There are 8 cases in merged.csv.
There are 8 cases in the df_final.csv.
Therefore, there were no dropped CM cases in 2018.


In [49]:
compare_spreadsheets_cm(2019)

Looking at 2019:
There are 171 cases in merged.csv.
There are 170 cases in the df_final.csv.
That's a difference of 1 case.


In [50]:
compare_spreadsheets_cm(2020)

Looking at 2020:
There are 138 cases in merged.csv.
There are 138 cases in the df_final.csv.
Therefore, there were no dropped CM cases in 2020.


In [51]:
compare_spreadsheets_cm(2021)

Looking at 2021:
There are 8 cases in merged.csv.
There are 8 cases in the df_final.csv.
Therefore, there were no dropped CM cases in 2021.


## Checking how many credible fear review cases were missing

Since our analysis was not restricted to Cameroonians, we compared the difference in credible fear review cases between the original (merged.csv) dataset and the one that had missing decision rows dropped (data_final.csv). 

The function below is set up in the same way as the one above – the only difference is it no longer filters non-Cameroonian cases. To run the function properly, I dropped 489 rows with missing years. However, that made up a little more than half a percent of the rows and therefore would had virtually no impact on the analysis. 

For the years we looked at (2018, 2019, 2020, 2021) there were only a small percentage of dropped cases. 2021 had the most with 3.72% missing. That small a difference should not significantly affect our analysis. 

In [70]:
cfr_universe = merged[merged["A_CASE_TYPE"]=="CFR"]
# 489 rows had missing years so they were dropped. But they made up
# a little more than half a percent of the credible fear review rows
# so they were safe to drop without having a noteable impact on the data
cfr_universe = cfr_universe[~cfr_universe["B_COMP_DATE"].isnull()]
def compare_spreadsheets_cfr(year):
    tdf = filter_data(df, la_only=False, cfr_only=True, cm_only=False)
    universe = len(cfr_universe[cfr_universe["B_COMP_DATE"].str.contains(str(year))])
    cleaned = len(tdf[tdf["DECISION_DATE_YEAR"]==year])
    difference = universe-cleaned
    print("Looking at {}:".format(year))
    print("There are {} cases in the universe spreadsheet.".format(universe))
    print("There are {} cases in the cleaned spreadsheet".format(cleaned))
    if difference == 0:
        print("Therefore, there were no dropped CM cases in {}.".format(year))
    else:
        print("That's a difference of {} case.".format(difference))
        percent = ((cleaned-universe)/universe*100)
        print("That's a percent difference of {}.".format(percent))


In [64]:
compare_spreadsheets_cfr(2018)


Looking at 2018:
There are 7478 cases in the universe spreadsheet.
There are 7361 cases in the cleaned spreadsheet
That's a difference of 117 case.
That's a percent difference of -1.5645894624231076.


In [53]:
compare_spreadsheets_cfr(2019)


Looking at 2019:
There are 17266 cases in the universe spreadsheet.
There are 17145 cases in the cleaned spreadsheet
That's a difference of 121 case.
That's a percent difference of -0.7007992586586355.


In [54]:
compare_spreadsheets_cfr(2020)


Looking at 2020:
There are 8336 cases in the universe spreadsheet.
There are 8189 cases in the cleaned spreadsheet
That's a difference of 147 case.
That's a percent difference of -1.7634357005758157.


In [55]:
compare_spreadsheets_cfr(2021)


Looking at 2021:
There are 14313 cases in the universe spreadsheet.
There are 13780 cases in the cleaned spreadsheet
That's a difference of 533 case.
That's a percent difference of -3.7238873751135335.
