# Wrangling + Tf-Idf model

In [437]:
import pandas as pd

import warnings
#warnings.filterwarnings('ignore')

In [438]:
#data = pd.read_csv('data.csv', error_bad_lines=False, encoding="utf-8") # CSV old way
data = pd.read_json("data_5scheduler.json") # json new way

Look at the data:

In [439]:
print(data.describe(include="object"))

                title   identifier description  source instructors offered  \
count            4443         4443        4443    4443        4443    4443   
unique           3761         4234        3989       5        1231     139   
top     Senior Thesis  PHYS-178-KS              Pomona          []           
freq               56            2          95    1446        2023    1559   

       prerequisites corequisites  
count           4443         4443  
unique           716           28  
top                                
freq            3359         4411  


### Duplicates:
It looks like there is only 3989 unique course descripition so let's remove duplicates based on 'description' column.
There are also rows with empty descriptions, which are not helpful

In [440]:
print(len(data))
data = data.drop_duplicates(subset='description')
data = data[data["description"] != ""]
print(len(data))

4443
3988


In [441]:
data.head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee
0,Introduction to American Cultures,AMST-103-HM,An interdisciplinary introduction to principal...,HarveyMudd,300,[Staff],,,,False,0
1,Print and American Culture,AMST-115-HM,Covers numerous developments in American print...,HarveyMudd,300,[Anup Gampa],,,,True,0
2,Hyphenated Americans,AMST-120-HM,A focus on the experience of immigrants in the...,HarveyMudd,300,[Balseiro],,,,False,0
3,"Life: Knowledge, Belief, and Cultural Practices",ANTH-110-HM,An exploration of cultural attitudes toward li...,HarveyMudd,300,[de Laet],,,,False,0
4,Introduction to the Anthropology of Science an...,ANTH-111-HM,An introduction to science and technology as c...,HarveyMudd,300,[Marianne De Laet],,,,True,0
5,War and Conflict,ANTH-115-HM,“The wings of the butterfly—that cause the hur...,HarveyMudd,300,[de Laet],,,,False,0
6,Rationalities,ANTH-134-HM,What does it mean to be rational? Does it mean...,HarveyMudd,300,[de Laet],Offered alternate years,Any introductory course in anthropology or any...,,False,0
7,A History of Landscape Photography,ARHI-131-HM,This course explores how photographic landscap...,HarveyMudd,300,[Fandell],,,,False,0
8,Modern and Contemporary Art Practices,ART-002-HM,This class is an experimental lecture style ar...,HarveyMudd,300,[Fandell],,,,False,0
9,Photography,ART-033-HM,Approaching the medium from an artistic perspe...,HarveyMudd,300,[Fandell],,ART002 HM,,False,150


### Tf-Idf with scikit-learn
[Description](https://monkeylearn.com/blog/what-is-tf-idf/)

[Usage](https://kavita-ganesan.com/tfidftransformer-tfidfvectorizer-usage-differences/#.Y1M42ezMJhF)

Here is an example of how Tf-Idf would work if our documents were the following 4 sentences:

In [442]:
import sklearn
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

corpus = [
     'this is the first document',
     'this document is the second document',
     'and this is the third one',
     'is this the first document',
]
vectorizer = TfidfVectorizer(use_idf=True)
vectors = vectorizer.fit_transform(corpus)
firstv = vectors[0]
df = pd.DataFrame(firstv.T.todense(), index=vectorizer.get_feature_names(), columns=["tfidf"])
df = df.sort_values(by=["tfidf"], ascending = False)
print("TfIdf values for the first sentence")
print(df)


TfIdf values for the first sentence
             tfidf
first     0.580286
document  0.469791
is        0.384085
the       0.384085
this      0.384085
and       0.000000
one       0.000000
second    0.000000
third     0.000000


In the example above we can see the importance of each word ranked for the first sentence `'this is the first document'`. So, for example the word `first` is important since it doesn't appear in any other document. The word `the` is not as important since it appears in all other documents. And the word `third` is not important at all since it doesn't even appear in the first document.

### Rank classes based on a given word
Function `tfidf_word(word, data)` takes in the word we are interested in and the data we are looking at. The function returns an updated dataframe with a new column `"score"` that gives each class a score of importance based on the input word. 

In [443]:
def tfidf_word(word, data_1):
    data_2 = data_1.copy() # since we don't want to be making changes to our original dataframe
    corpus = list(data_2.description)
    vectorizer = TfidfVectorizer(use_idf=True)
    vectors = vectorizer.fit_transform(corpus)

    score_for_word = []
    words = vectorizer.get_feature_names()
    try:
        index = words.index(word)
    except:
        print("'" + word + "'" + " is not mentioned in any course descriptions")
        return

    for i in range(0, len(corpus)):
        value = vectors[i].T.todense()[index]
        score_for_word.append(value)

    score_for_word = [float(i) for i in score_for_word] # type cast each score to a float

    data_2["score"] = score_for_word
    data_2 = data_2.sort_values(by=["score"], ascending = False)
    return data_2

For example, let's say we are interested in ranking all of the classes based on the word `computer`:

In [444]:
tfidf_word('computer', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
132,Computer Science Seminar,CSCI-181-HM,Advanced topics of current interest in compute...,HarveyMudd,0,[Staff],Fall and Spring,Permission of instructor,,False,0,0.442194
525,Special Topics in Computer Science,CSCI-181-CM,Selected topics in computer science. May be re...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.431867
1490,Computer Science Colloquium,CSCI-188-PO,Colloquium presentations and discussions of to...,Pomona,0,[Joseph C Osborn],Each semester.,"CSCI 051A PO , or CSCI 051G PO , or CSCI 051J ...",,True,0,0.422261
426,Introduction to Computational Neuroscience,BIOL-133L-KS,This course provides computational skills for ...,ClaremontMckenna,100,[],Every fall,,,False,0,0.342191
1491,Computer Science Senior Seminar,CSCI-190-PO,"Reading, discussion and presentation of resear...",Pomona,25,[Joseph C Osborn],Each semester.,Senior standing and two CSCI core courses (inc...,,True,0,0.33416
1060,Computational Physics and Engineering,PHYS-100-KS,This course is a comprehensive introduction to...,ClaremontMckenna,100,[Scot Gould],Every spring,,,True,0,0.327428
518,Fundamentals of Computer Science,CSCI-052-CM,"A solid foundation in functional programming, ...",ClaremontMckenna,100,[],Occasionally,,,False,0,0.314859
3358,Computational Physics and Engineering,PHYS-100-KS,This course is a comprehensive introduction to...,Scripps,100,[],Every spring,"PHYS033L KS , PHYS034L KS ; or PHYS030L KS ,...",,False,0,0.311172
137,Computer Science Colloquium,CSCI-195-HM,Oral presentations and discussions of selected...,HarveyMudd,50,[Melissa E. O'Neill],Fall and Spring,Juniors and seniors only,,True,0,0.276373
106,Introduction to Biology and Computer Science,CSCI-005GR-HM,This course introduces fundamental concepts fr...,HarveyMudd,300,"[Wu, Bush (Biology)]",Fall,,,False,0,0.257451


These are the first 10 instances of the classes that are most related to the word `computer` ranked in descending order (more related classes are on top). So, we could recomend a student who is interested in `computer`
 to take these classes.

Bellow are the outputs for fords `data, culture, activism, fiction, environment`

In [445]:
tfidf_word('data', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
526,Advanced Projects in Data Science,DS-180-CM,This course allows teams of students to wrestl...,ClaremontMckenna,100,[Jeho Park],Every year,,,True,0,0.491459
1365,Data Analysis and Programming for the Life Sci...,BIOL-174-PO,This course explores the analysis of big data ...,Pomona,100,[Andre Cavalcanti],Last offered spring 2019.,BIOL 040 PO and one of the following CSCI 005 ...,,True,0,0.449229
513,Foundations of Data Science,CSCI-036-CM,Data science is the interdisciplinary study of...,ClaremontMckenna,100,[Sarah Cannon],Every year,,,True,0,0.437465
269,Nonlinear Data Analytics,MATH-178-HM,Analysis of nonlinear large dynamic data inclu...,HarveyMudd,300,[Gu],Fall,CSCI070 HM and (CSCI140 HM or MATH131 HM or...,,False,0,0.393787
3254,CS1: Intro to Python and Viz,MS-059-SC,This is an introduction to computer programmin...,Scripps,0,[],,,,False,0,0.389261
2974,Econometrics,ECON-125-SC,Statistical techniques for testing economic mo...,Scripps,100,[Roberto Pedace],,ECON 101 and ECON 120 .,,True,0,0.372383
580,Accounting Data Analytics,ECON-160-CM,This course will introduce students to the use...,ClaremontMckenna,100,[George Batta],Every year,,,True,0,0.367099
110,Data Structures and Program Development,CSCI-070-HM,Abstract data types including priority queues ...,HarveyMudd,300,"[Melissa E. O'Neill, Erin Talvitie]",Fall and Spring,"(CSCI060 HM or CSCI042 HM ), and at least one...",,True,0,0.36479
523,Introduction to Data Mining,CSCI-145-CM,Data mining is the process of discovering patt...,ClaremontMckenna,100,[Charles Griffiths],Every year,,,True,0,0.345841
969,Introduction to Data Mining,MATH-166-CM,Data mining is the process of discovering patt...,ClaremontMckenna,100,[],Every year,,,False,0,0.344338


In [446]:
tfidf_word('culture', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
3153,Introduction to the Philosophy and History of ...,HIST-123-SC,This course will focus on some of the major wo...,Scripps,100,[],Every year,,,False,0,0.36554
3183,Introduction to the Philosophy and History of ...,HMSC-123-SC,This course will focus on some of the major wo...,Scripps,100,[],,,,False,0,0.362115
903,Film and Mass Culture,LIT-138-CM,This course will examine film as art and as me...,ClaremontMckenna,100,[],Every third year,,,False,0,0.297434
785,Culture and Society in Weimar and Nazi Germany,HIST-139E-CM,A study of the transformation of German cultur...,ClaremontMckenna,100,[],Every other year,,,False,0,0.258521
4122,Popular Culture,MS-125-PZ,This course will cover a broad range of histor...,Pitzer,0,[],,,,False,0,0.248049
3023,Literature and Popular Culture in the Antebell...,ENGL-143S-SC,The years preceding the Civil War saw both the...,Scripps,100,[],Every other year,,,False,0,0.245847
3726,Visual Culture at the Margins,ASAM-171-PZ,This course will examine various forms of visu...,Pitzer,0,[],,,,False,0,0.23808
748,Cold War America,HIST-099-CM,"The Cold War dramatically shaped the politics,...",ClaremontMckenna,100,[Lily Geismer],Occasionally,,,True,0,0.233963
764,Cold War America,HIST-118-CM,"The Cold War dramatically shaped the politics,...",ClaremontMckenna,100,[],Every other year,,,False,0,0.233417
3187,Critical Theory and Modern Culture,HMSC-136-SC,This course explores historical and contempora...,Scripps,100,[],,,,False,0,0.230769


In [447]:
tfidf_word('activism', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
1409,"Latinx Social Movements: Identity, Power, and ...",CHST-136-CH,Latin/o/a/xs have historically used grassroots...,Pomona,100,[A. Zimmerman],Fall 2020.,,,False,0,0.481129
3637,"Political Activism, Social Movements and the P...",ANTH-138-PZ,"By examining contemporary issues, themes, and ...",Pitzer,0,[],,,,False,0,0.38931
2748,"Art, Activism, Propaganda",ARHI-184-SC,"Explores the intersection of art, political ac...",Scripps,100,[],Occasionally,,,False,0,0.304916
364,"Activism, Vocation, Justice",RLST-168-HM,The histories of social change activism are fi...,HarveyMudd,150,[Dyson],,Instructor permission,,False,0,0.258028
3710,Introduction to Asian American Studies,ASAM-101-PZ,Introduction to the field of Asian American St...,Pitzer,100,[Rosanna Simons],,,,True,0,0.223309
2536,Social and Political Movements,SOC-075-PO,Can activism from below change society and pol...,Pomona,100,[C. Beck],Last offered fall 2018.,,,False,0,0.219529
3421,Remaking the Self,POLI-173-SC,How do social movements change the world by ch...,Scripps,100,[],Occasionally,,,False,0,0.198564
2540,"Los Angeles Communities: Transformations, Ineq...",SOC-114-CH,Use of case study approach to explore the inte...,Pomona,100,"[Jeffrey D. Groves, Frank Lubbock Miller, Fran...",Last offered spring 2018.,Any course in Chicanx-Latinx Studies or Sociology,,True,0,0.175725
354,Global Environmental Politics,POST-140-HM,Analyzes the political dynamics driving global...,HarveyMudd,300,[Steinberg],,,,False,0,0.174303
3169,African American Women in the United States,HIST-171-AF,This course explores the distinctive and diver...,Scripps,100,[],,,,False,0,0.172807


In [448]:
tfidf_word('fiction', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
915,Advanced Creative Writing,LIT-181-CM,This is a class for the student who is serious...,ClaremontMckenna,100,[],Every year,,,False,0,0.400743
4424,Journalism in Latin America,SPAN-162-PZ,Better than Fiction: Journalism in Latin America,Pitzer,0,[],,,,False,0,0.372122
838,Introduction to Creative Writing,LIT-031-CM,This course offers the chance to explore three...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.365195
3050,Introduction to Fiction Writing,ENGL-193-SC,This is an introductory course on writing shor...,Scripps,100,[Leila Mansouri],,,,True,0,0.28716
3051,Advanced Fiction Writing Workshop,ENGL-194S-SC,This advanced fiction workshop is intended for...,Scripps,0,[],,,,False,0,0.277572
214,Fiction Workshop,LIT-035-HM,This course is designed as an introductory wor...,HarveyMudd,300,[Salvador Plascencia],Fall and Spring,,,True,0,0.276736
607,The Francophone Caribbean,FREN-115-CM,A study of works of writers and artists from H...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.274805
2500,Post-Soviet Russian Culture and Society,RUSS-182-PO,The course explores the major changes in Russi...,Pomona,100,[Larissa V. Rudova],Spring 2022.,RUSS 044 PO,,True,0,0.258057
917,Advanced Fiction Writing,LIT-183-CM,This advanced fiction workshop is intended for...,ClaremontMckenna,100,[Mary Gaitskill],Every year,,,True,0,0.255142
889,European Modernist Fiction,LIT-122-CM,The first half of the 20th century produced an...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.239483


In [449]:
tfidf_word('environment', data).head(10)

Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score
1809,Food and the Environment in Asia and the Pacific,HIST-101F-PO,A single question inspired this seminar: what ...,Pomona,100,[S. Yamashita],Last offered spring 2018.,,,False,0,0.339513
2815,Microbiology,BIOL-168L-KS,In this fundamental microbiology course we wil...,Scripps,100,[],Every year,BIOL043L KS or BIOL040L KS ; BIOL044L KS ; CH...,,False,0,0.287316
1047,Environmental Ethics,PHIL-187-CM,An exploration of human beings’ ethical relati...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.286515
456,Microbiology,BIOL-168L-KS,In this fundamental microbiology course we wil...,ClaremontMckenna,100,[Pete Chandrangsu],Occasionally,,,True,0,0.277818
3904,Critical Environmental Analysis,EA-150-PZ,A seminar examination of how environmental iss...,Pitzer,0,[],,,,False,0,0.257127
2559,"Africa, the Environment, and the Global Economy",SOC-189H-PO,"Drawing on sociology and related disciplines, ...",Pomona,100,[S. Stefanos],Each fall.,,,False,0,0.221758
3883,Urban Ecology,EA-098-PZ,Urban ecology is a subfield of ecology that de...,Pitzer,100,[Heather Campbell],,,,True,0,0.206919
3616,Global Environmental Conflict,ANTH-082-PZ,This class uses the tools of anthropology and ...,Pitzer,0,[],,,,False,0,0.20558
3600,Native Americans and Their Environments,ANTH-012-PZ,This course will investigate the traditional i...,Pitzer,100,[Sheryl Miller],,,,True,0,0.199418
325,Topics in Physics,PHYS-080-HM,"An area of physics is studied, together with i...",HarveyMudd,300,"[Donnelly, Saeta]",,PHYS051 HM,,False,0,0.196412


### Rank classes based on a given class
`tfidf_id(id, data)` will take in a identifier of some course (i.e. "PHIL-187-CM") and return courses that are most similar to the givern course. To do it, we will look at which words ranked highest for the given class and find other classes where same word also ranked highest.

Similarly to `tfidf_word`, we will add new columns that have the score of each class based on each word. In thins case we will have multiple new columns that have similarity scores for different word that were important in our input class.

In [450]:
def tfidf_word_helper(word, data1, prefix=""):
    data2 = data1.copy()
    corpus = list(data2.description)
    vectorizer = TfidfVectorizer(use_idf=True)
    vectors = vectorizer.fit_transform(corpus)

    score_for_word = []
    words = vectorizer.get_feature_names()
    try:
        index = words.index(word)
    except:
        print("'" + word + "'" + " is not mentioned in any course descriptions")
        return

    for i in range(0, len(corpus)):
        value = vectors[i].T.todense()[index]
        score_for_word.append(value)

    score_for_word = [float(i) for i in score_for_word] # type cast each score to a float

    data2["score"+prefix] = score_for_word
    return data2

In [451]:
def tfidf_id(id, data_):
    index_id = list(data_.index[data_["identifier"] == id]) # index of input class
    
    if len(index_id) == 0:
        print("Couldn't find a course " + id)
        return
    else:
        index_id = index_id[0]
    print(data_.loc[index_id, "identifier"])
    print(data_.loc[index_id, "description"])

    corpus = list(data_.description)
    vectorizer = TfidfVectorizer(use_idf=True)
    vectors = vectorizer.fit_transform(corpus)
    words = vectorizer.get_feature_names()

    scores_id = vectors[index_id].T.todense() # score values for our given class
    scores_id = [float(i) for i in scores_id] # type cast to float
    score_for_word = {}

    
    for i in range(0, len(words)):
        if scores_id[i] > 0.2: # We will count a word as relevant if it's score if more than 0.2 (arbitrary value, subject to change)
            score_for_word[words[i]] = [scores_id[i], i]
    score_for_word = {k: v for k, v in sorted(score_for_word.items(), key=lambda item: item[1][0], reverse=True)} #ordear dict by descending vals
    print("Most important words in course "+ id + " and their scores and indices")
    print(score_for_word)

    data__ = data_.copy() ## since we're making changes to our dataframe, we don't want to save these changes in the original dataframe
    new_cols = []
    for word in score_for_word:
        new_cols.append("score_" + word)
        data__ = tfidf_word_helper(word, data__, "_"+word) # tf idf word helper will help us add new columns with prefixes, same algo as regular tfdf_word()

    data__ = data__.sort_values(by=new_cols, ascending= False)
    return data__
    

In [452]:
# This one works
tfidf_id("AMST-120-HM", data).head(10)

AMST-120-HM
A focus on the experience of immigrants in the United States and Americans of diverse ethnic backgrounds, as reflected in literature and critical theory. The course will weave together works that treat the lives of immigrants and minority groups in the United States with examinations of such contemporary issues as bilingual education, the conditions of migrant workers, and children as cultural and linguistic interpreters for their parents. The intentionally broad and interdisciplinary nature of the course enables exploration of cultural identities, socio-economic status, and gender-specific roles.
Most important words in course AMST-120-HM and their scores and indices
{'immigrants': [0.26855708018539415, 7214], 'the': [0.20853700869471273, 14088]}


Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score_immigrants,score_the
2364,The Politics of Immigration and Citizenship,POLI-046-PO,Examines immigration and citizenship politics ...,Pomona,100,[Staff],Last offered spring 2019.,,,False,0,0.324335,0.031481
2,Hyphenated Americans,AMST-120-HM,A focus on the experience of immigrants in the...,HarveyMudd,300,[Balseiro],,,,False,0,0.268557,0.208537
3209,Italians as Guests and Hosts: Intercultural En...,ITAL-136-SC,This course examines the phenomenon of exchang...,Scripps,100,[],,ITAL 044 or equivalent.,,False,0,0.236742,0.114895
1192,Images of Immigration in Spanish Literature an...,SPAN-122-CM,"From an interdisciplinary perspective, this co...",ClaremontMckenna,100,[],Every other year,,,False,0,0.20955,0.122038
3927,Criminalization of Immigrants,FS-016-PZ,How did immigration and the U.S. - Mexico bord...,Pitzer,100,[Steffanie Guillermo],,,,True,0,0.17458,0.22029
4260,US Immigration and Transnational Politics,POST-174-CH,Examines the factors shaping the size and comp...,Pitzer,0,[],,,,False,0,0.168014,0.195696
1789,US Immigarion History,HIST-029-PO,This introductory seminar examines the history...,Pomona,100,[Staff],Fall 2021.,,,False,0,0.156974,0.243783
3748,Immigration from and#8220;The Tropicsand#8221;...,CHLT-120-PZ,This class will focus on the immigration movem...,Pitzer,100,[Michael J. Ballagh],,,,True,0,0.155281,0.150721
771,"Asian American History, 1850 to the Present",HIST-125-CM,This survey course examines the history of Asi...,ClaremontMckenna,100,[],Occasionally,,,False,0,0.153383,0.119104
4262,Immigration and Race in America,POST-175-CH,America has long prided itself in being a nati...,Pitzer,0,[],,,,False,0,0.15244,0.088778


In [453]:
# Doesn't work. Gives wrong word scores for some reason
tfidf_id("CSCI-036-CM", data).head(10)

CSCI-036-CM
Data science is the interdisciplinary study of the tools and theory behind using data to extract knowledge. It combines ideas from statistics, computer science, and particular domains in the hard and social sciences in order to make predictions and optimal decisions. This course covers the foundations of data science including the basics of how to structure, visualize, transform, and model data.
Most important words in course CSCI-036-CM and their scores and indices
{'climate': [0.5680111476749007, 2808], 'change': [0.4435747516841977, 2519], 'earth': [0.3197073895630546, 4660]}


Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score_climate,score_change,score_earth
65,"Global Climate Change: Non-linearity, Irrevers...",CHEM-041-HM,Principles of the chemical and physical basis ...,HarveyMudd,300,"[Lisa M. Sullivan, Thomas David Donnelly]",Fall,,,True,0,0.654041,0.306454,0.0
533,Global Climate Change,EA-100L-KS,Introduction to the Earth Sciences with a focu...,ClaremontMckenna,100,[],Every year,,,False,0,0.579752,0.362195,0.261053
532,Global Climate Change,EA-100-KS,"An introduction to the earth sciences, this co...",ClaremontMckenna,100,[],Every year,,,False,0,0.568011,0.443575,0.319707
2993,Global Climate Change w/Lab,EA-100L-KS,"An introduction to the Earth Sciences, this co...",Scripps,100,[],Every year,BIOL043L KS and BIOL044L KS ; or BIOL040L KS ...,,False,0,0.564488,0.352659,0.25418
1728,Climate Change,GEOL-020G-PO,An integrated perspective of Earth’s dynamic c...,Pomona,100,[W. McLaughlin],Spring 2022.,,,False,0,0.529256,0.0,0.44684
544,Modeling Climate Change,ECON-090-CM,A deep understanding of climate change require...,ClaremontMckenna,100,[Mark Vinci],Every other year,,,True,0,0.453498,0.265611,0.127627
1367,Global Change Biology,BIOL-189E-PO,Global Climate Change Biology. Relying on scie...,Pomona,100,"[F. Hanzawa, N. Karnovsky]",Fall 2021.,BIOL 041E PO,,False,0,0.447997,0.349853,0.0
2434,Seminar: Psychology of Climate Change,PSYC-180C-PO,This seminar will explore psychological perspe...,Pomona,100,[A. Pearson],Last offered spring 2021.,PSYC 051 PO,,False,0,0.373708,0.194559,0.0
324,Climate and Energy,PHYS-078-HM,Our climate’s dominant behavior is determined ...,HarveyMudd,300,[Donnelly],,,,False,0,0.358309,0.0,0.067225
2433,"Climate of Change: Climate Science, Psychology...",PSYC-180C-JT,This cross-disciplinary seminar will explore t...,Pomona,100,[A. Pearson],Fall 2021.,,,False,0,0.230465,0.179976,0.0


In [454]:
# Also doesn't work
tfidf_id("LIT-138-CM", data).head(10)

LIT-138-CM
This course will examine film as art and as medium in the context of the rise of 20th-century “mass culture.” We will take up such topics as the role of film in producing the ideas of “mass culture”, the cinematic representation of the “masses”, film as an instrument of the standardization of culture and as a mode of resistance to it, film and modernism, film and postmodernism, representations of fascism in cinema, and “subculture” considered as an effect of mass culture.
Most important words in course LIT-138-CM and their scores and indices
{'accounting': [0.46959270338428, 539], 'financial': [0.39176367047494914, 5729], 'the': [0.21810273324145757, 14088]}


Unnamed: 0,title,identifier,description,source,credits,instructors,offered,prerequisites,corequisites,currently_offered,fee,score_accounting,score_financial,score_the
3798,Accounting and Finanicial Analysis,ECON-087-PZ,Examines the role of accounting information in...,Pitzer,0,[],,,,False,0,0.566749,0.236409,0.151862
575,Financial Statement Analysis,ECON-154-CM,Combines finance and accounting in a user-orie...,ClaremontMckenna,100,[Peter J McAniff],Every year,,,True,0,0.505073,0.105341,0.067668
577,Accounting Ethics,ECON-156-CM,A case-method survey of ethical problems confr...,ClaremontMckenna,100,[Gary Raymond Birkenbeuel],Occasionally,,,True,0,0.492398,0.13693,0.02932
929,Financial Reporting and Communication,FIN-386-CM,This course will introduce students to the lan...,ClaremontMckenna,100,[],Every fall,,,False,0,0.469593,0.391764,0.218103
1546,Managerial Accounting Financial Analysis,ECON-117-PO,Examines the role of accounting information in...,Pomona,100,[Richard S. Savich],Each spring.,,,True,0,0.446066,0.186068,0.119525
935,Advanced Accounting Analysis,FIN-440-CM,The focus of this course is the connection bet...,ClaremontMckenna,100,[],Every spring,,,False,0,0.417751,0.0,0.074625
572,Asset and Income Measurement (Intermediate Acc...,ECON-150-CM,This course examines both conceptual foundatio...,ClaremontMckenna,100,[Gregory Lonzo],Every semester,,,True,0,0.37817,0.157747,0.101332
571,"International Accounting, Taxation, and Transf...",ECON-149-CM,"An introduction to global accounting, cross-li...",ClaremontMckenna,100,[Bernadette LeGrant],Occasionally,,,True,0,0.373339,0.0,0.100037
579,Accounting Theory and Research,ECON-159-CM,An intensive study of the evolution and develo...,ClaremontMckenna,100,[],Every year,,,False,0,0.270099,0.0,0.192996
923,Leadership Development in Finance and Accounting,FIN-301B-CM,Continuation of FIN301A CM - Leadership Develo...,ClaremontMckenna,25,[],Every Spring,,,False,0,0.261592,0.0,0.0
