Skip to content

jiananarthurli/insight_api

Repository files navigation

Weekendpedia Django API

This is the Django API server source code repo for the the Insight Data Science Project Weekendpedia. The source codes for the Chrome extension part are in another repo.

Weekendpedia is an Chrome extension that recommend cultural events (in galleries, museums, etc) in New York City for Wikipedia users. The extension will track the current Wikipedia topic that the user is viewing, and alert the user when a relevant cultural event is found.

This YouTube video demonstrates how the extension interacts with users.

This repo contains the Django backend for the API server (in ./insight_api_src). The event data is scraped from nyc.com and stored in ./insight_api_data.

Recommendation algorithm

The server uses keyword extraction and TF-IDF for content recommendation. Keywords are captured using named entity recognition (NER) and part-of-speech (PoS) tagging. The IDF space is defined by the keywords from the event descriptions. The cosine similarities between TF-IDF feature vectors of the wiki articles and all the event descriptions are calculated. Event information is returned to users if the similarities are higher than the threshold.

More details are explained in the notebook and the slides.

Service details

API service

The Chrome extension sends the URL to the Django server if the user is currently navigating to Wikipedia. The wiki topic is extracted from the URL, and the intro text of the corresponding wiki page is retrieved using the API provided by Wikipedia. The text is converted to a feature vector, using the pre-calculated IDF weights of the keywords from event descriptions. The cosine similarities between the feature vector and all the pre-calculated feature vectors of the events are calculated by the recommender. If the similarities are higher than the threshold, the information (name, link, etc) of the corresponding event is retrieved by the recommender from the PostgreSQL server linked to the Django server, and returned to the user Chrome extension as JSON strings. The IDF weights, the feature vectors of the events and the PostgreSQL database are updated once the events are updated.

Components of the server

The Django API server has three main components: extractor, vectorizer and recommender.

The extractor (./insight_api_src/extractor/) retrieves pure texts of the Wikipedia topic that the user is viewing, using the API provided by Wikipedia. The functions are defined in ./insight_api_src/extractor/views.py.

The vectorizer (./insight_api_src/vectorizer/) converts the text into a feature vector using TF-IDF algorithm (details are explained in the notebook in ./recommender_prototype), and sent to the recommender. The functions are defined in ./insight_api_src/extractor/views.py.

The recommender (./insight_api_src/vectorizer/) calculates the cosine similarities between the feature vectors of the wiki texts and the pre-calcualted feature vectors of the events. Events are recommended when the similarities are higher than the threshold. The event infomation (name, description, link, etc) is retrieved from the PostgreSQL server and returned as JSON strings. The functions are defined in ./insight_api_src/extractor/views.py.

About

Backend of the API service for the Insight project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages