Microsoft Fabric Airbnb use case

This Github page gives in-depth insight in the Airbnb use-case which is discussed in the following Medium blog post. In this small repository the Python notebooks and Data Factory pipeline configuration files are presented. Below an overview of the life cycle of this use case and which files are relevant to each part in the life cycle:

Data extraction

Here the data is extracted via the Opendatasoft API. We used the following data factory pipeline to extract all the cities from opendatasoft; Pipeline_parent_opendatasoft_cities.json and pipeline_child_get_city_listings to copy the datasets from Opendatasoft to datalakehouse. Finally, the process_nested_json notebook is executed within the Data Factory pipeline to parse the json files using Pyspark and writing the structured tables to OneLake.

Data exploration

All the data exploration is in eda notebook.

Data transformation/feature engineering

Based on the data exploration, we determined which transformations to perform on the dataset. In the transformation notebook the dataset is tranformed and eventually written to its' final form before model training is performed

Model training

To train the different models we used the following algorithms:

Linear Regression
Decision Tree
Random Forrest
Multilayer Perceptron

MLflow was used to create a Machine Learning experiment and to track results and store artifacts. The script is found in ml_flow_experiment

Model evaluation

In the Random Forrest folder we share the model artifact of the best performing model of this experiment, which was a Random forrest. Next to the model artifact which is generated by Fabric there is also a numpy file which holds the predictions and the ground truth values of the test set.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Random Forrest		Random Forrest
README.md		README.md
eda.ipynb		eda.ipynb
ml_flow_experiment.ipynb		ml_flow_experiment.ipynb
pipeline_child_get_city_listings.json		pipeline_child_get_city_listings.json
pipeline_parent_opendatasoft_cities.json		pipeline_parent_opendatasoft_cities.json
process_nested_json_append.ipynb		process_nested_json_append.ipynb
transformation.ipynb		transformation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft Fabric Airbnb use case

Data extraction

Data exploration

Data transformation/feature engineering

Model training

Model evaluation

About

Releases

Packages

Languages

sekaki22/Medium_blog_Fabric_airbnb

Folders and files

Latest commit

History

Repository files navigation

Microsoft Fabric Airbnb use case

Data extraction

Data exploration

Data transformation/feature engineering

Model training

Model evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages