# Dataset

Download the dataset from InsideAirbnb for New York City and build an inverted index using Whoosh.
The function `populate_index` takes care of all. It requires the directory name where the index will be stored, usually it's `index` and
it's located in the current working directory.

`.gitignore` tells git to ignore the top level subdirectory `index`, where the inverted index for this project should be stored.

Once you download and generate the II, you don't need to rebuild it anymore and you can directly load it using Whoosh open_in.

In [47]:
from placerank.dataset import populate_index

Download dataset source and build an index. Currently `["id", "name", "description", "neighborhood_overview"]` will be indexed, to add keys to the indexing edit the II schema and add keys to the function `placerank.dataset.DocumentLogicView`.

In [48]:
populate_index("index")

## Search

To search in the inverted index, we have to open it first.

In [49]:
from whoosh.index import open_dir

ix = open_dir("index")

Two additional objects are required to perform queries: a `Searcher` and a `QueryParser`. Their names are pretty self-explanatory.

In [50]:
from whoosh.qparser import QueryParser

parser = QueryParser("neighborhood_overview", ix.schema)

This query parser will search for terms in the "neighborhood_overview" field only. Then display the results of the query.

In [51]:
UIN = "penthouse"

query = parser.parse(UIN)
with ix.searcher() as searcher:
    results = searcher.search(query)
    print(*[hit.get("name") for hit in results], sep='\n')

Condo in New York · ★5.0 · 1 bedroom · 1 bed · 1.5 baths
Rental unit in New York · ★4.47 · 1 bedroom · 1 bed · 1 shared bath
Rental unit in New York · 6 bedrooms · 10 beds · 8 baths
Rental unit in New York · 3 bedrooms · 3 beds · 2 baths
Resort in New York · Studio · 1 bed · 1 private bath
Resort in New York · 1 bedroom · 1 bed · 1 private bath
Rental unit in Brooklyn · 1 bedroom · 2 beds · 1.5 baths
Rental unit in Queens · 1 bedroom · 1 bed · 1 shared bath
Condo in New York · ★4.60 · 1 bedroom · 3 beds · 2 baths
Rental unit in New York · ★4.98 · 2 bedrooms · 2 beds · 2 baths


### Sentiment

In [63]:
from sentimentModule.sentiment import GoEmotionsClassifier
classifier = GoEmotionsClassifier()

UIN = "beautiful, quiet and peaceful apartment"

query = parser.parse(UIN)
with ix.searcher() as searcher:
    results = searcher.search(query)
    sentiment = classifier.classify_texts(str(query))
    print(*[hit.get("name", sentiment) for hit in results], sentiment, sep='\n')

Rental unit in New York · ★4.71 · 1 bedroom · 1 bed · 1 bath
Rental unit in New York · 1 bedroom · 3 beds · 1 bath
Home in Brooklyn · ★4.67 · 1 bedroom · 1 bed · 1 shared bath
Serviced apartment in New York · ★4.81 · 3 bedrooms · 4 beds · 2 baths
Rental unit in Staten Island  · ★4.88 · 1 bedroom · 1 bed · 1.5 shared baths
[[{'label': 'admiration', 'score': 0.991719126701355}, {'label': 'neutral', 'score': 0.5229455828666687}]]
