GitHub - slvkmr/Hotels_Analysis: Take away assignment for a data engineer role

Hotels Analysis

Input:

Json file about hotels in India with which has 9 fields and 305 rows.Rows are subjecct to change and the code must scale to size of data. The fields are as below:

HOTEL NAME
ADDRESS
STATE
PHONE
FAX
EMAIL ID
WEBSITE
TYPE
Rooms

Functionality:

The solution would process the json file and form a pandas dataframe by extracting the data. From the dataframe, hotels with maximum of rooms under each type would be computed. Below are possible approaches:

Naive approach is to fit entire data into the memory and do the computation.
Increase the number of machines and process the data by distributing among them.
Divide the data into chunks which could be fit into memory and process the data sequentially.

Submitted solution contains first and third approach. First approach is in the notebook hotels_analysis.ipynb and second approach is in the notebook hotels_analysis_large_files.ipynb

Output:

The computed output dataframe is stored as csv file. result.csv and result_1.csv files are output files.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
hotels.json		hotels.json
hotels_analysis.ipynb		hotels_analysis.ipynb
hotels_analysis_large_files.ipynb		hotels_analysis_large_files.ipynb
result.csv		result.csv
result_1.csv		result_1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hotels Analysis

Input:

Functionality:

Output:

About

Releases

Packages

Languages

slvkmr/Hotels_Analysis

Folders and files

Latest commit

History

Repository files navigation

Hotels Analysis

Input:

Functionality:

Output:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages