# SI 699 :: Seminar :: Book 3 of 3 :: Custom Model Generation

# Tutorial Roadmap

<b>Acquisition (Part 1 of 3 :: PRAW // Data Gathering)</b>
- We gathered data from Reddit, divided it up, and dumped it into some files

<b>Preparation (Part 2 of 3 :: Default NER System)</b>
- We considered the default entity recognizer and ways to highlight output so we can consider performance at tagging

<b>Execution (Part 3 of 3 :: Natural Entity Recognition Training)</b>
- Use labelled training data
- Consider our custom-recognizer performance

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
import pandas as pd
import random
import spacy
import warnings

from spacy.util import minibatch, compounding
from spacy.lang.en import English

# The training data
Our training data is effectively a list of tuples where the first element is a string of text (a post), and the second element is a dictionary. This dictionary contains a key, <code>entities</code>, which gives a list of tuples that describe where we can find each entity we think should be extracted and what to call it.

These tuples are (starting_index, ending_index, "labelType"), so if we wanted to extract colors from "red and green plants" we would note entities (0,2,"Color") and (9,12,"Color). This can seem tedious, but consider the example code below which makes finding indexes quick and easy:

In [3]:
import re
focal_string = "red and green plants"

# ("red and green plants",{"entities":[(0,3,"Color"),(8,13,"Color")]})
[match for match in re.finditer("red|green", focal_string, flags=re.IGNORECASE)]

[<re.Match object; span=(0, 3), match='red'>,
 <re.Match object; span=(8, 13), match='green'>]

The actual training data we'll use is:

In [4]:
TRAIN_DATA = [
    ("Michigan is in the ELITE 8!",
        {"entities":[(19,26,"Sport")]}),

    ("Returning student housing options What options are available for singles/doubles for a returning Junior?",
        {"entities":[(18,33,"Housing")]}),

    ("Living off(on-campus) Hi, junior-transfer student here.  I'm a predicament regarding my living options for the next school year. My parents and I live in the Ann Arbor-Ypsi area and I was wondering if it is worth it to live off campus (in an apartment/house) very close to downtown or just stay living with my parents.  Basically, I'm asking if there are any differences being minutes walks away from all the campus activities, living alone versus just buying a parking pass and driving to campus and enjoying uofm that way.  Answers from people that live distant from campus (needing a car) would be very useful.  Thanks for any input!",
        {"entities":[(219,234,"Housing"),(242,251,"Housing"),(252,257,"Housing"),(295,317,"Housing")]}),

    ("How to skip Physics 240 Hi! I'm a senior in High School at the moment, and I'm committing to U of M for Engineering! So I've been reading about Michigan Engineering being a GPA killer, specifically the required course Physics 240. I was wondering if there are any ways to skip this class other than getting a 5 on the AP Phys E/M test. I didn't take AP Phys C this year because I am planning to major in Chemical Engineering. Are there any summer classes that are offered or community college courses that would give me credit to skip Physics 240? Thanks!!",
        {"entities":[(12,23,"Class"),(218,229,"Class"),(535,546,"Class")]}),

    ("We need better mental health resources in this city, but also read this and protect yourselves!",
        {"entities":[(15,28,"Mental")]}),

    ("CAPS Not trying to bash on anyone in particular (mostly because I\'m not really sure who to direct this at) but the handling of mental health issues on campus and as a whole recently is pretty funny to me. I had to an emergency/urgent crisis meeting with a CAPS rep. I\'m not sure what their idea is of care. I spoke to someone who clearly was reading off a transcript, I assume some grad student practicing. Essentially, I was told to talk about what\'s going on (took like 35-45 minutes) then for the last 15ish minutes I was given basic coping mechanisms that I could have found by googling 'how to stop being depressed/anxious). I\'m not sure if this is normal or what. I definitely don\'t blame the dude I spoke to because I assume he was just trying to do his job. Its probably a good thing I spoke to this guy because I wasn\'t suicidal. If someone who wanted to die accidentally got paired with a robotic apathetic grad student during a serious crisis it could end awfully. I think I felt a little better after talking to the guy simply because it was so comical that a student crisis hotline was being manned by other students at this school who have lost interest in their work. I wish they would fund CAPS more, especially during this pandemic, so that these apathetic grad students can take care of smaller issues and problems and let licensed people deal with the serious mental health issues. Everyone is b\\*\\*\\*\\*ing about safety measures and testing (despite college students being one of the least likely groups to have serious implications from the disease) while serious mental health problems have quadrupled in the United States. I assume those numbers are likely worse for college students. Were just going to let people wallow away in isolation I guess; what a good holistic approach to community health. Also idk if its just my insurance, but 75% of the in-network therapists near here are 1)not accepting new patients and 2) over the age of 65. It honestly is a bit funny to me that when I finally get the strength to seek out help I have these types of issues. Does anyone have any good experiences with CAPS lately? or similar experiences. Or ideas on how to find help around Ann Arbor?     EDIT: apparently the person I talked to was a phd.",
        {"entities":[(0,4,"Mental"),(127,140,"Mental"),(256,260,"Mental"),(610,619,"Mental"),(620,627,"Mental"),
                     (1080,1094,"Mental"),(1206,1210,"Mental"),(1379,1392,"Mental"),(1584,1597,"Mental"),
                     (1752,1761,"Mental"),(2124,2128,"Mental")]}),

    ("EECS 183 honor code question Can we get honor coded if our labs look similar since we are allow to collab on the labs?",
        {"entities":[(0,8,"Class")]}),

    ("Subleasing or Airbnb Hello everyone, I really need your advice.   I haven't found housing yet for this summer class 2021. I am not sure whether I should stay in Airbnb by contracting month to month rental or find someone who is subleasing their Apt.   A quick note, I might also need a place to stay for Fall 2021 as well. It's my first time visiting Ann Arbor ever since I got accepted.  Kindly help here?   Thank you guys!!",
        {"entities":[(0,10,"Housing"),(82,89,"Housing"),(198,204,"Housing"),(228,248,"Housing"),(286,299,"Housing")]}),

    ("Question About Single Dorms  I was recently admitted as an incoming sophomore transfer student. I’m very covid-conscious, and I want to have my own space without a roommate. I explored my options in apartments, but most of the singles I could find were pretty expensive so I thought I’d look into the single dorms. However, I’m having trouble finding information about where on campus there are single dorms and how likely I am to get one. I don’t mind community bathrooms or anything, but I’d like my own room on central campus.  Do most residence halls have single dorms available? Am I likely to get one as an incoming sophomore transfer student?   Please no responses telling me you strongly recommend not getting a single because i’ll become a hermit; I have friends on campus, including my girlfriend who I will be seeing often. I won’t be staying in my dorm 24/7. If it’s unlikely that i’ll receive a single, I’ll try to find somebody likeminded.",
        {"entities":[(15,27,"Housing"),(199,209,"Housing"),(301,313,"Housing"),(395,407,"Housing"),(539,554,"Housing"),(560,572,"Housing")]}),

    ("Math 214 Relevance in IOE Coursework I’m taking Math 214 right now and it’s not too bad but it’s definitely not a class I enjoy material-wise. I’m planning on majoring in IOE, so I was wondering how much knowledge from Math 214 comes up in IOE courses. I know Math 214 is a pre-req for IOE, but I was curious to know what a current IOE student/recent graduate about how much linear algebra I’ll be doing in the major.  Thanks!",
        {"entities":[(0,8,"Class"),(48,56,"Class"),(219,227,"Class"),(260,268,"Class")]}),

    ("Is Michigan allowing fans to go to sporting events? I know the BIG 10 is allowing it now but I’m not sure if Michigan has changed their policies yet",
        {"entities":[(35,50,"Sport")]}),

    ("Men’s Basketball Season I would like to say that we had a great run this season especially when facing challenges relating to COVID-19. The hard work of the basketball team will never go unnoticed, and I want to go cry myself to sleep! I hope we bounce back big. Go Blue!!!",
        {"entities":[(6,23,"Sport")]}),

    ("Transferring within Taubman College Hi all! I’m an international student admitted into the Urban Tech programme at Taubman (starting Winter 2022) I was wondering if it’s possible to transfer to the Archi programme even before school officially starts.   Thanks in advance!!",
        {"entities":[]}),

    ("What has your experience at UMich been like? Hi! I'm applying to schools this fall and UofM has been on my list for a while, but I'm looking for people who actually go to the school to give me more information on things that you only really know if you go there (like the stuff they don't tell you on the website). How's the overall experience? Campus life? Food? Party scene? Is it really really cold? Any information at all helps, I'm looking to go to school for Biology.",
        {"entities":[]}),

    ("WGS 240 Research Inquiry Hello all,  I am writing to inquire about potential interest in/availability for an interview (approx. 45 minutes) for a final project in WGS 240. For the project, my group is researching how intersectional sociopolitical and medicalized structures impact Black women and girls' access to and experiences with reproductive healthcare.  The final project will be a presentation that integrates course curriculum with the interview materials and offers cross-interview analysis and awareness-raising. While I have a list of prepared questions, the interviewee is encouraged to address the topic and share their knowledge as they see fit during the course of the conversation. They will also have the opportunity to review the completed project, and any of their contributions will be anonymized.  Would this be something you would like to participate in? If so, please feel free to message me at your convenience at [mccaulm@umich.edu](mailto:mccaulm@umich.edu). I will do my very best to accommodate your schedule and any other needs, if that is the case.  Regardless, I am wishing everyone a wonderful rest of their semester!",
        {"entities":[(0,7,"Class"),(163,170,"Class")]}),

    ("Looking for a sublease for the summer I'm moving to AA for the summer and looking for a place to stay.  Preferably somewhere downtown, but I'm open to anywhere in the area.  I'm not a Michigan student, so I probably won't be able to stay anywhere owned by the university.  It'll only be for a few months, so somewhere that's furnished would be a huge plus.",
        {"entities":[(14,22,"Housing"),(42,54,"Housing"),(74,93,"Housing")]}),

    ("Transferring Credit from IB Scores? I plan on transferring to umich and in highschool i took a couple of HL classes which would allow me a few credits at umich, but I’m unsure if umich would accept them? I heard for schools like UCSD, they don’t accept ib grades that are older than a year.Does anyone know?",
        {"entities":[]}),

    ("How easy is it to get a job? I'm a prospective student planning to enroll in the fall. How easy is it to get a job on campus and how flexible are the hours usually? Is it realistic to aim to get a job my first semester?",
        {"entities":[]}),

    ("I have an exam on Tuesday during the Michigan game. Any chance to get it moved? I’ve emailed my teacher but we’ve had such few opportunities to feel like a community this year and now it’s just going to be taken by an exam.",
        {"entities":[(33,50,"Sport")]}),

    ("How hard are UMICH premed courses? I was accepted EA to LSA. I am a premed/predental student. Based on the research i have done, i've noticed that the premed courses are extremely difficult and kill gpa. Although I know that these classes will 100% prepare me for the MCAT/DAT and med school, I don't want to ruin my gpa (i know that sounds stupid but it's still a worry for me) which will ultimately ruin my confidence in pursuing after med school. I am receiving a ton of aid which i feel like is an offer i cant pass up on. The other school i am looking at is UAB. They also have a great premed program and it will be absolutely free for me. But i feel like the prestige of UMICH is just different. I don't wanna make a decision based on other people's judgement either. Every doctor i've shadowed have gone to undergrad schools that I have not even heard of for the most part, so i know it doesn't matter that much. But ultimately, I just wanna know how difficult those premed classes at umich are and whether it's even worth it for med/dental school.   Thank you if anyone answers!",
        {"entities":[(19,33,"Class"),(151,165,"Class"),(974,988,"Class")]}),

    ("Math 115 Exam Winter Scale So I was wondering what % of questions, usually speaking does it take to get a B- or higher in an exam? I keep taking practice tests and I have been hitting in the high 50s low-mid 60%s range and was wondering what that would come out to be grade letter wise.",
        {"entities":[(0,8,"Class")]}),

    ("Looking for roommate north campus 2021-2022 Posting for a friend: Is anyone looking for a housemate to fill a room on north campus for fall/winter 2021-2022? If not, any suggestions?",
        {"entities":[(0,20,"Housing"),(76,99,"Housing"),(103,114,"Housing")]}),

    ("Elite Eight!!!!!!! Feels great to be a part of the best community ever! GO BLUE AND WHAT A TIME TO BE ALIVE!!! Honestly, March Madness is helping me cope with looking at lecture slides for the past months.",
        {"entities":[(0,11,"Sport"),(121,134,"Sport")]}),

    ("Every time I check my email",
        {"entities":[]}),

    ("How to do well at this school if I'm chronically lazy? I got accepted as a sophmore (maybe juinior??) from a community college with a 4.0. I hear that UofM is a very challenging school and grade inflation is nonexistent. This is a little bit alarming as I am an extremely lazy and I think there were maybe 5 projects in total I ever started more than a day in advance of the deadline during my entire first two years at college. Despite this, I have always pulled through at the last minute. After some lurking, I get the impression that is not going to fly and tbh I am a little worried. I also sleep 10 hours every night and usually wake up after noon, any tips for improving? I am legit the laziest person I know and the only one in my entire extended family who's ever gotten into anything even approaching a selective college.",
        {"entities":[]}),

    ("GEO and Councilmember Nelson look to change leasing laws to ease student housing hunt",
        {"entities": [(65,80,"Housing")]}),

    ("International student on some questions about visa and vaccination Hey umich Reddit,   I'm a freshman international student moving into campus next semester since the classes won't be virtual anymore. I have some quick questions about visas since some peers recommended me to utilize Reddit to ask questions.   First, what should I do after I receive my F-2 visa? Are there procedures I have to know regarding when to move in and such?  Second, I am currently waiting for my certification email from the international center for my re-printing of my I-20. How long does it usually take? I sent mine about a week ago.   Third, I am aware that I cannot be vaccinated unless everyone with a greencard is vaccinated. Is that true?     Thanks for your help. Been reading the posts for quite a while to get info on this school. I hope I am not in the wrong subreddit.",
        {"entities": [(175,191,"Remote"),(418,425,"Housing")]}),
      
    ("M-Sci Has anyone done this program? Can they give any two cents on it?",
        {"entities": []}),

    ("Scenes from Yesterday’s Win. Let’s do it again on Tuesday",
        {"entities":[(12,17,"Sport")]}),

    ("Swim Coaching Jobs? Heya, I'm a freshman who will probably enroll in the fall and ever since I got injured I've been working as a swim coach which I want to continue in college. I was wondering if you guys knew if there were opportunities to do so since most swimming seems to happen in schools here. Obviously I can't work full time, so is it a realistic ambition?   I'm level 2 qualified in the UK and also a qualified lifeguard. I have like 3 years of experience at a relatively high level.   I know everyone says don't get a job in your first semester and all but I'm an international so I'm gonna need all the cash I can get bc tuition🙃🙃 and I also love my job and don't really want to stop :)",
        {"entities":[(0,13,"Sport")]}),

    ("ELECTRIC ENGINEERING MAJOR Hi! I’m an incoming freshman and I got into Electrical Engineering in Warren College at UCSD. I also got into college of engineering at Umich.(undergrad)   I’ve been having a hard time deciding between the two and i’m hoping you guys can help me out with a few of my questions. Even if you don’t know much about the Umich pls feel free to comment.   In general, how good are the profs at Umich or is it just a mixed bag like basically every other uni?   How good/helpful is the careers department in helping you get internships and would you say being in cali is a massive advantage for finding internships and jobs in tech?   Are the reputations of either school significant when applying for jobs and if so which school has the better rep?   How easy is it to make friends and are there school events, parties etc. to help with socialising? Also what is there to do in general off campus?   How good and affordable is the food in the area and also what’s the quality and variety of food in the dining halls like?  What is public transportation like? If staying off campus would I need a car to get around?   I’ve grown up in a semi arid climate( hot and dry for much of the year with mil) so going to ucsd wouldn’t be too different from what i’m used to. I’ve read that long cold dark winters like in ann arbor can cause seasonal depression. So have any of you lived in such places and if so how badly does the weather affect you and do you think it should be an important factor to consider when deciding on uni?   If you have anything else to add to help me pls don’t hesitate to mention it!   Thank you in advance, much appreciated. :)",
        {"entities":[(1082,1100,"Housing")]}),

    ("Change Major as Admitted Transfer Student Hey, I’m just admitted as a transfer student of CoE. Actually, I was accepted as a major in mechanical engineering, but I want to change my major to CS or ECE. Could it be possible to switch my major as a transfer student? I am a little worried that there is no guarantee to switch into CS major because of no space for transfer students.",
        {"entities":[]}),

    ("Terminating Munger contract Has anyone successfully been able to terminate their Munger contract pre-COVID? I'll be graduating in the Fall and ideally would like to renew my lease, however supposedly graduating early isn't a good enough reason to terminate my contract.",
        {"entities": [(0,27,"Housing"),(81,96,"Housing"),(174,179,"Housing"),(247,268,"Housing")]}),

    ("Living near north v.s. central campus as someone in the engineering department Hi everyone! I'll be a grad student this fall studying robotics (I'm assuming most if not all will be on north campus either in the engineering/robotics buildings). I've heard it can be isolating to live near north campus and was wondering if it would be a hassle to live near central campus (e.g. Kerrytown) but commute to north campus for classes? Are there bus routes that go straight from residential areas near downtown to north campus? I'm really new to AA so any insight would be great, thanks!",
        {"entities":[(0,37,"Housing"),(278,300,"Housing"),(346,370,"Housing")]})
]


### And now to make the model:


In [5]:

def model_maker(model=None, output_dir=None, n_iter=100, v=False):
    """Load the model, set up the pipeline and train the entity recognizer."""
    #### https://spacy.io/usage/training
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")

    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner, last=True)
    # otherwise, get it so we can add labels
    else:
        ner = nlp.get_pipe("ner")

    # add labels
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    # only train NER
    with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
        # show warnings for misaligned entity spans once
        warnings.filterwarnings("once", category=UserWarning, module='spacy')

        # reset and initialize the weights randomly – but only if we're
        # training a new model
        if model is None:
            nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                    texts,  # batch of texts
                    annotations,  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorize data
                    losses=losses
                    )
            if(v):print("Losses", losses)

    # test the trained model
    for text, _ in TRAIN_DATA:
        doc = nlp(text)
        print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
        #print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        for text, _ in TRAIN_DATA:
            doc = nlp2(text)
            print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
            #print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])
    
    return(nlp)

custom_NER = model_maker(n_iter=100, v=0)

Created blank 'en' model
Entities [('Terminating Munger contract', 'Housing'), ('Munger contract', 'Housing'), ('lease', 'Housing'), ('terminate my contract', 'Housing')]
Entities [('Looking for roommate', 'Housing'), ('looking for a housemate', 'Housing'), ('fill a room', 'Housing')]
Entities [('Math 214', 'Class'), ('Math 214', 'Class'), ('Math 214', 'Class'), ('Math 214', 'Class')]
Entities [('student housing', 'Housing')]
Entities [('Math 115', 'Class')]
Entities [('Subleasing', 'Housing'), ('housing', 'Housing'), ('rental', 'Housing'), ('subleasing their Apt', 'Housing'), ('place to stay', 'Housing')]
Entities []
Entities []
Entities [('Elite Eight', 'Sport'), ('March Madness', 'Sport')]
Entities [('staying off campus', 'Housing')]
Entities []
Entities [('EECS 183', 'Class')]
Entities [('live off campus', 'Housing'), ('apartment', 'Housing'), ('house', 'Housing'), ('living with my parents', 'Housing')]
Entities [('sublease', 'Housing'), ('moving to AA', 'Housing'), ('looking for a

# Try Out the Custom Model

In [6]:
test_data = pd.read_csv("/content/drive/Shareddrives/SI699 Capstone/Tutorial/data/test_data.csv")

In [7]:
test_data.head()

Unnamed: 0,id,fused
0,mdo8i5,thinking of transferring out of u of m current...
1,mdjgt6,Affordable Off-Campus Housing I have been acce...
2,mdirba,Questions for an Ecology/Evolutionary Bio majo...
3,mdi0cw,School of Information Decisions When do the SI...
4,mdhw7f,How hard is it to transfer majors within and a...


In [8]:
#### We can set our color profiling for DisplaCy here with a dictionary approach
colors = {"CLASS": "cyan", "SPORT": "orange", "HOUSING": "violet", "REMOTE":"green", "MENTAL":"red"}
options = {"ents": ["CLASS", "SPORT", "HOUSING", "REMOTE", "MENTAL"], "colors": colors}

<b>default</b>

In [9]:
nlp = spacy.load("en_core_web_sm")
from spacy import displacy
displacy.render(nlp(test_data.iloc[0].fused.replace('\n\n','\n')), jupyter=True, style='ent')

<b>custom</b>

In [10]:
from spacy import displacy
displacy.render(custom_NER(test_data.iloc[0].fused.replace('\n\n','\n')), jupyter=True, style='ent', options=options)

<b>default</b>

In [11]:
nlp = spacy.load("en_core_web_sm")
from spacy import displacy
displacy.render(nlp(test_data.iloc[5].fused.replace('\n\n','\n')), jupyter=True, style='ent')

<b>custom</b>

In [12]:
from spacy import displacy
displacy.render(custom_NER(test_data.iloc[5].fused.replace('\n\n','\n')), jupyter=True, style='ent', options=options)

<b>default</b>

In [13]:
nlp = spacy.load("en_core_web_sm")
from spacy import displacy
displacy.render(nlp(test_data.iloc[393].fused.replace('\n\n','\n')), jupyter=True, style='ent')

<b>custom</b>

In [14]:
from spacy import displacy
displacy.render(custom_NER(test_data.iloc[393].fused.replace('\n\n','\n')), jupyter=True, style='ent', options=options)