In [15]:
import pandas as pd
import numpy as np
import pickle
import json
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import HTML
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
import string


def process_query(query):
    query = word_tokenize(query.lower())
    stop = stopwords.words('english')
    query = [item for item in query if item not in stop]
    query = [''.join(c for c in s if c not in string.punctuation) for s in query]
    query = [c for c in query if not c.isdigit()]
    wordnet_lemmatizer = WordNetLemmatizer()
    query = [wordnet_lemmatizer.lemmatize(w) for w in query]
    porter_stemmer = PorterStemmer()
    query = [porter_stemmer.stem(w) for w in query]
    with open('word_dict.json', 'r') as f:
        word_dict = json.load(f)
    query_word_id = [word_dict[i] for i in query if i in word_dict]
    query_vec = np.zeros(len(word_dict))
    if len(query_word_id) > 0:
        query_vec[query_word_id] = 1
    return query_vec
    

print('Keywords for Course Search:')
query = input()
query_vec = process_query(query)
tfidf = np.load('tfidf.npy')
similarty = cosine_similarity(query_vec[np.newaxis, :], tfidf)
with open('../course_id.pkl', 'rb') as f:
    course_dict = pickle.load(f)
    course_id = course_dict['course_id']
    id_course = course_dict['id_course']
ranking = similarty[0].argsort().tolist()[::-1]

with open('../courseId_description.json', 'r') as f:
    course_des = json.load(f)

returned_courses = pd.DataFrame(columns=['Subject & Number', 'Title & Description'])
for i in ranking[:20]:
    returned_courses = returned_courses.append({'Subject & Number': id_course[i], 'Title & Description': course_des[str(i)]}, ignore_index=True)
#pd.set_option("display.max_colwidth", -1)
#print(returned_courses)
HTML(returned_courses.to_html())

[nltk_data] Downloading package punkt to /home/jenny/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/jenny/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /home/jenny/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Keywords for Course Search:
a data science course which has low requirement for math


Unnamed: 0,Subject & Number,Title & Description
0,Sociology 88,"Data Science for Social Impact: This course explores the role of social research in policymaking and public decisions and develops skills for the communication of research findings and their implications in writing and through data visualization. Students will develop an understanding of various perspectives on the role that data and data analysts play in policymaking, learn how to write for a public audience about data, results, and implications, and learn how to create effective and engaging data visualizations.\n\nData Science Connector: This course builds on the Foundations of Data Science course by teaching more advanced data visualization skills and techniques, by providing an understanding of how data is used, and by teaching how to communicate about data in writing."
1,Geography 88,"Data Science Applications in Geography: Data science methods are increasingly important in geography and earth science. This course introduces some of the particular challenges of working with spatial data arising from characteristics specific to such data. These issues will be explored in a series of modules deploying data science methods to investigate contemporary topics in geography and earth science, relating to climate science, hydrology, population census and remote sensing of environment. No prior knowledge is assumed or expected."
2,Data Science W205,"Storing and Retrieving Data : Storing, managing, and processing datasets are foundational processes in data science. This course introduces the fundamental knowledge and skills of data engineering that are required to be effective as a data scientist. This course focuses on the basics of data pipelines, data pipeline flows and associated business use cases, and how organizations derive value from data and data engineering. As these fundamentals of data engineering are introduced, learners will interact with data and data processes at various stages in the pipeline, understand key data engineering tools and platforms, and use and connect critical technologies through which one can construct storage and processing architectures that underpin data science applications."
3,Physics 88,"Data Science Applications in Physics: Introduction to data science with applications to physics. Topics include: statistics and probability in physics, modeling of the physical systems and data, numerical integration and differentiation, function approximation. Connector course for Data Science 8, room-shared with Physics 77. Recommended for freshmen intended to major in physics or engineering with emphasis on data science."
4,Global Studies 88,"Data Science and Global Studies: This course will examine data science ideas in the context of Global Studies. The class will teach students to work actively with data and to interpret and critique their analyses of data. Students will learn to leverage data science skills in relation to explicit ways of knowledge creation; utilize tools in basic data literacy, including misuse of statistics, intentional and unintentional; examine ways of text and natural language processing concepts through cases related to different areas of Global Studies specifically, and social sciences generally; examine complex factors that influence the way we learn, build and interpret data. Topics vary by instructor."
5,Molecular & Cell Biology 288,"Data Science for Molecular and Cell Biology: Data science is rapidly becoming a critical skill for molecular and cell biologists. This course provides a survey of data science concepts and methods, including practical statistical inference and modeling, data visualization and exploration, elementary machine learning, and simulation. The course is practically oriented. Diverse real-world datasets, along with simulated data, will be used to develop skills and intuition."
6,Linguistics 188,"LINGUISTIC DATA: How can we use data science methods to understand human language? Linguistics involves\nthe study of language sounds, words, meanings, context, structure and change. This course\nprovides students with the computational skills necessary to analyze linguistic data from\nthese areas. We will draw on data from languages around the world and use computer\nprogramming and data visualization techniques from Foundations of Data Science."
7,"Data Science, Undergraduate 88","Data Science Connector: Designed to be taken in conjunction with the Foundations of Data Science (COMPSCI/INFO/STAT C8) course, each connector course will flesh out data science ideas in the context of one particular field. Blending inferential thinking and computational thinking, the course relies on the increasing availability of datasets across a wide range of human endeavor, and students' natural interest in such data, to teach students to work actively with data in a field of their interest and to interpret and critique their analyses of data. Topics vary by field, and several topics will be offered each term."
8,Industrial Eng & Ops Rsch 235,"Data, Systems and Signals: This is an advanced project course in data science that offers a ""maker"" and/or ""innovation"" viewpoint. The course is focused first on developing an open-ended-real world project relating to data science. Related concepts of computer science tools and theoretical concepts are covered to support the project. These concepts include filtering, prediction, classification, LTI systems, and spectral analysis. After reviewing each concept, we explore implementing it in Python using libraries for math array functions, manipulation of tables, data architectures, natural language, and ML frameworks."
9,Information 206B,"Introduction to Data Structures and Analytics: The ability to represent, manipulate, and analyze structured data sets is foundational to the modern practice of data science. This course introduces students to the fundamentals of data structures and data analysis (in Python). Best practices for writing code are emphasized throughout the course. This course forms the second half of a sequence that begins with INFO 106. It may also be taken as a stand-alone course by any student that has sufficient Python experience."
