MoviesSearch

A simple search engine using term frequency-inverse document frequency (TFIDF)
I saw my cousin building this, got curious and tried to build my own

Prepare data

rem "download general data from:  https://drive.google.com/file/d/1CZsJGWS9hZ7z2t_fJcmxnn-fVo_EtM5P/view?usp=sharing"
rem "create a folder named 'data'"
rem "move the csv file just downloaded in to 'data'"

python prepare_data/create_tokenize_data.py --csv-in ./data/general_movies_data.csv --csv-out ./data/tokenized_data.csv

How to run

>>> from MoviesSearch import MoviesSearchEngine
>>> search_engine=MoviesSearchEngine("path/to/tokenized/data","path/to/general/data")
>>> search_engine.search("woody and buzz lightyear")
[(1, 'Toy Story'), (2, 'Toy Story 3'), (3, 'Toy Story 2'), (4, 'In the Shadow of the Moon'), (5, 'For Your Consideration')]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
prepare_data		prepare_data
README.md		README.md
__init__.py		__init__.py
inverted_index.py		inverted_index.py
moviessearch.py		moviessearch.py
token.py		token.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoviesSearch

Prepare data

How to run

About

Releases

Packages

Languages

louisdo/MoviesSearch

Folders and files

Latest commit

History

Repository files navigation

MoviesSearch

Prepare data

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages