<br><br>
## Now, a 667-long json file. 
### or a list of 667 case dictionaries.
My next steps should be to build a dataframe with the name of each case, see whether it's granted/denied (regex), see which country the plaintiff is from (regex), and the list of core terms I already have!
<br><br>

In [1]:
import json
import re
import pandas as pd
with open('asylum_data_complete.json',"r") as f:
    all_asylum_cases = json.load(f)

#### First, finding the decisions and countries with regex

In [2]:
# This searches for either grant, or granted. 
# (although all the cases I've seen use granted, 
# I don't want to miss grant in case anyone uses it)
granted = r"\bGRANT\w{0,2}"

# This searches for either deny or denied.
denied = r"\bDENY?(IED)?"


In [3]:
# the pattern for country of citizenship is citizen(s) of [country]. 
# (in second item in opinion)
# Sometimes the country can be multiple words.
# But seems like the phrase describing 
# where someone's from is always separated by a comma
country = r"citizens?\s[\w\W]*of\s[\w\W]*[A-Z][^0-9]+,"

In [4]:
asylum_cases_organized = []

In [5]:
for asylum_case in all_asylum_cases:
    case = {}
    case["name"] = asylum_case["name"]
    conclusion = asylum_case["opinion"]
    if (re.search(granted, conclusion)):
        case["decision"] = "granted"
    elif (re.search(denied, conclusion)):
        case["decision"] = "denied"
    else:
        case["decision"] = None
        
    citizen = asylum_case["opinion"]
    case["country"] = re.search(country, citizen)
    case["terms"]  = asylum_case["terms"]
    case["op_length"] = len(asylum_case["opinion"])
    asylum_cases_organized.append(case)

In [6]:
asylum_cases_organized

[{'name': 'Zepeda-Lopez v. Garland, 38 F.4th 315',
  'decision': 'granted',
  'country': <re.Match object; span=(1027, 27063), match='citizens of both Honduras and Nicaragua. They app>,
  'terms': ['refugee',
   ' <span name="SH_2355451274" class="SS_prior SS_SH">asylum</span>',
   ' nationality',
   ' dual',
   ' persecution',
   ' citizenship',
   ' singular',
   ' removal',
   ' legislative history',
   ' resettlement',
   ' well-founded',
   ' withholding',
   ' unwilling',
   ' eligible',
   ' deference',
   ' qualify',
   ' refugee status',
   ' statutory text',
   ' Immigration',
   ' proceedings',
   ' policies',
   ' gang'],
  'op_length': 27169},
 {'name': 'Quituizaca v. Garland, 52 F.4th 103',
  'decision': 'denied',
  'country': <re.Match object; span=(1345, 40382), match='citizen of Ecuador, entered the United States in >,
  'terms': ['withholding',
   ' <span name="SH_2355451274" class="SS_prior SS_SH">asylum</span>',
   ' removal',
   ' motive',
   ' persecution',
   ' g

In [7]:
len(asylum_cases_organized)

667

#### Now I convert all the country regex matches into strings of the countries. Also check all the places regex didn't work. Manually. And drop ones that don't apply to my query

In [8]:
asylum_df = pd.DataFrame(asylum_cases_organized)

In [9]:
pd.options.display.max_colwidth = None

In [10]:
asylum_df

Unnamed: 0,name,decision,country,terms,op_length
0,"Zepeda-Lopez v. Garland, 38 F.4th 315",granted,"<re.Match object; span=(1027, 27063), match='citizens of both Honduras and Nicaragua. They app>","[refugee, <span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, nationality, dual, persecution, citizenship, singular, removal, legislative history, resettlement, well-founded, withholding, unwilling, eligible, deference, qualify, refugee status, statutory text, Immigration, proceedings, policies, gang]",27169
1,"Quituizaca v. Garland, 52 F.4th 103",denied,"<re.Match object; span=(1345, 40382), match='citizen of Ecuador, entered the United States in >","[withholding, <span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, removal, motive, persecution, gang, alien, ethnicity, but-for, ambiguity, causation, withholding-of-removal, eligibility, Immigration, concurrence, credibility, unambiguous, deference, burden of proof, social group, indigenous, persuasive, attacks, waived, petition for review, plain meaning, plain text, nationality, persecutor, suggests]",40390
2,"Liang v. Garland, 10 F.4th 106",denied,"<re.Match object; span=(819, 28175), match='citizen of the People\'s Republic of China, illeg>","[persecution, blacklist, <span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, removal, omission, credibility, credibility determinations, withholding, quotation, marks, cross-examination, inconsistencies, church, totality of the circumstances, well-founded, corroborate, disclose, eligible]",28201
3,"Doleck Nepali v. Barr, 828 Fed. Appx. 14",denied,"<re.Match object; span=(238, 9751), match='citizen of Nepal, seeks review of a November 28, >","[persecution, <span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, terrorist, removal, credibility determinations, withholding, credible]",9853
4,"Fangfang Xu v. Cissna, 434 F. Supp. 3d 43",granted,"<re.Match object; span=(63, 43720), match='citizen of the People\'s Republic of China, appli>","[<span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, adjudicate, delays, motion to dismiss, mandamus, USCIS, lack subject matter jurisdiction, unreasonable delay, plaintiff's claim, allegations, fail to state a claim, pleadings, quotation, due process, time frame, reasons, courts, alien, marks, subject matter jurisdiction, immigration, factors, rights, private right of action, judicial review, rule of reason, timetable, due process claim, plaintiff's right, reasonable time]",43803
...,...,...,...,...,...
662,"Xuexia Feng v. Garland, 846 Fed. Appx. 75",denied,"<re.Match object; span=(223, 3960), match='citizen of the People\'s Republic of China, seeks>","[motion to reopen, new evidence, persecution, torture, petition for review, per curiam, quotation, targets, marks, prima facie case, credibility, withholding, violence, removal, reopen]",4062
663,"Xu v. Garland, 2022 U.S. App. LEXIS 1008",denied,"<re.Match object; span=(224, 4600), match='citizen of the People\'s Republic of China, seeks>","[persecution, church, authorities, attend, petition for review, well-founded, practicing, letters, gospel, spread, underground, strangers, arrested, solid]",4702
664,"Yan Rong Liu v. Barr, 825 Fed. Appx. 26",denied,"<re.Match object; span=(224, 3883), match='citizen of the People\'s Republic of China, seeks>","[conditions, motion to reopen, petition for review, administrative notice, change condition, material change, per curiam, demonstrates, untimely, motions]",4019
665,"Yun Qing Huang v. Garland, 852 Fed. Appx. 609",denied,"<re.Match object; span=(226, 2511), match='citizen of China, seeks review of a November 16, >","[credibility determinations, petition for review, exhaust, meaningfully, credibility, reasons, waived]",2613


In [11]:
def regex_to_country(rematch):
    entire_str = rematch.string[rematch.start():]
    country = ''
    for word in entire_str.split(" "):
        print(word)
        if word.lower() == "citizen" or word.lower() == "citizens" or word.lower() == "of" or word.lower() == "the" or word.lower() == "and" or word.lower() == "both":
            continue
        elif word[:2].istitle():
            country += word+" "
        else:
            break
    country.strip().strip(",")
    return country

In [12]:
asylum_df["country"] = asylum_df["country"].apply(lambda x: regex_to_country(x) if(x != None) else x==None)

citizens
of
both
Honduras
and
Nicaragua.
They
applied
citizen
of
Ecuador,
entered
citizen
of
the
People's
Republic
of
China,
illegally
citizen
of
Nepal,
seeks
citizen
of
the
People's
Republic
of
China,
applied
citizen
of
the
People's
Republic
of
China,
seeks
citizen
of
the
People's
Republic
of
China,
seeks
citizen
of
El
Salvador,
seeks
citizen
of
El
Salvador,
testified
citizen
of
Guatemala,
seeks
citizens
of
Honduras.
They
left
citizen
of
Nepal,
seeks
citizen
of
Nigeria,
entered
citizen
of
Cameroon,
 [*111] 
citizens
of
Bangladesh,
and
Sadid
Hassan
and
Samara
Hassan,
seek
citizen
of
Nepal,
seeks
citizen
of
Guatemala
and
a
citizens
of
Guatemala,
seek
citizen
of
Honduras,
testified
citizens
of
Nepal,
seek
citizen
of
Bangladesh,
seeks
citizen
of
the
People's
Republic
of
China,
seeks
citizen
of
India
who
citizen
of
Ecuador,
seeks
citizen
of
the
Democratic
Republic
of
the
Congo,
seeks
citizen
of
the
People's
Republic
of
China,
seeks
citizen
of
Jamaica
who,
citizen
of
India,
seeks
citizen
of

In [13]:
asylum_df[asylum_df["country"].isnull()]

Unnamed: 0,name,decision,country,terms,op_length


In [14]:
asylum_df.at[277, 'country'] = "Guatemala"
asylum_df.at[299, 'country'] = "Cameroon"
asylum_df.at[337, 'country'] = "China"
asylum_df.at[347, 'country'] = "confidential"
asylum_df.at[224, 'country'] = "Honduras"
asylum_df.at[410, 'country'] = "Bangladesh"
asylum_df.at[453, 'country'] = "Dominican Republic"
asylum_df.at[528, 'country'] = "Somalia"
asylum_df.at[569, 'country'] = "Mexico"
asylum_df.at[615, 'country'] = "Saint Lucia"
asylum_df.at[630, 'country'] = "not mentioned"
asylum_df.at[638, 'country'] = "not mentioned"
asylum_df.at[652, 'country'] = "Honduras"
asylum_df.at[654, 'country'] = "Iraq"


In [15]:
asylum_df.at[26, 'decision'] = "partially granted"
asylum_df.at[44, 'decision'] = "partially granted"
asylum_df.at[46, 'decision'] = "partially granted"
asylum_df.at[49, 'decision'] = "partially granted"
asylum_df.at[109, 'decision'] = "granted"
asylum_df.at[219, 'decision'] = "denied"
asylum_df.at[224, 'decision'] = "partially granted"
asylum_df.at[277, 'decision'] = "partially granted"
asylum_df.at[300, 'decision'] = "denied"
asylum_df.at[349, 'decision'] = "denied"
asylum_df.at[351, 'decision'] = "denied"
asylum_df.at[375, 'decision'] = "granted"
asylum_df.at[380, 'decision'] = "denied"
asylum_df.at[382, 'decision'] = "partially granted"
asylum_df.at[392, 'decision'] = "partially granted"
asylum_df.at[401, 'decision'] = "partially granted"
asylum_df.at[418, 'decision'] = "partially granted"
asylum_df.at[422, 'decision'] = "denied"
asylum_df.at[453, 'decision'] = "partially granted"
asylum_df.at[473, 'decision'] = "denied"
asylum_df.at[500, 'decision'] = "partially granted"
asylum_df.at[521, 'decision'] = "denied"
asylum_df.at[545, 'decision'] = "granted"
asylum_df.at[547, 'decision'] = "denied"
asylum_df.at[556, 'decision'] = "partially granted"
asylum_df.at[560, 'decision'] = "partially granted"
asylum_df.at[567, 'decision'] = "denied"
asylum_df.at[570, 'decision'] = "partially granted"
asylum_df.at[588, 'decision'] = "partially granted"
asylum_df.at[595, 'decision'] = "partially granted"
asylum_df.at[638, 'decision'] = "partially granted"
asylum_df.at[640, 'decision'] = "denied"
asylum_df.at[650, 'decision'] = "partially granted"


In [16]:
droplist = [30, 32, 33, 41, 42, 43, 45, 48, 128, 206, 233, 269, 331, 335, 376, 381, 418, 419, 426, 461, 472, 487, 532, 536, 545, 555, 558, 572, 575, 632, 638, 641, 39, 43, 272, 307]
len(droplist)

36

In [17]:
asylum_df = asylum_df.drop(droplist) #cannot run this again scared of error message

### Now clean up the country column

In [18]:
# to get country by name, without the letters surrounding it.
asylum_df.country = asylum_df.country.str.replace(r"[,.;].*", "")
asylum_df.country = asylum_df.country.str.strip()

  asylum_df.country = asylum_df.country.str.replace(r"[,.;].*", "")


In [19]:
asylum_df[asylum_df.country == ""]

Unnamed: 0,name,decision,country,terms,op_length
109,"Ramirez v. Decker, 2020 U.S. Dist. LEXIS 39535",granted,,"[custody, arrested, aliens, parole, <span name=""SH_2355451274"" class=""SS_prior SS_SH"">asylum</span>, least restrictive, immigration, transferred, removal proceedings, arriving, sponsor, immigration judge, border, flight risk, detention, credible, detained, placement, turning, danger to the community, supervision, notice, physical custody, 18 year old, respectfully, conditions, recommend, jail, persecution, Directive]",37988
560,"Cano v. Decker, 2022 U.S. Dist. LEXIS 223992",partially granted,,"[burden of proof, disability, danger to the community, Immigration, detention, custody, clear and convincing evidence, due process, flight risk, private interest, habeas petition, detained, disorder, argues]",12960
567,"Gao v. Wolf, 2020 U.S. Dist. LEXIS 213957",denied,,"[removal order, Immigration, alien, court of appeals, eligibility, paroled, permanent resident, discretionary, lack jurisdiction, inspected, removal, application for adjustment, motion to dismiss, district court, subject-matter, appeals, factors, courts]",19353


In [20]:
asylum_df.at[109, 'country'] = "Guatemala"
asylum_df.at[560, 'country'] = "Guatemala"
asylum_df.at[567, 'country'] = "China"
asylum_df.at[351, 'country'] = "St. Kitts-Nevis"
asylum_df.at[556, 'country'] = "not mentioned"
asylum_df.country = asylum_df.country.str.replace("People's Republic of China", "China").str.replace("People's Republic China", "China")

In [21]:
asylum_df.to_csv("asylum_data.csv", index = False)