Skip to content

patrickbloomingdale/AirBnB-Data-Analysis-Austin-TX

Repository files navigation

Analyzing Airbnb Austin using Data Science

Project 1, Udacity's Data Scientist Nanodegree


Introduction

For this project, I created a blog post and Github repository for my data science portfolio.

  • Come up with three questions you are interested in answering.
  • Extract the necessary data to answer these questions.
  • Perform necessary cleaning, analysis, and modeling.
  • Evaluate my results.
  • Share my insights with stakeholders.

Libraries

  • pandas
  • numpy
  • from collections import defaultdict
  • calendar
  • datetime
  • seaborn as sns
  • matplotlib.pyplot
  • #make sure plots render within the notebook
  • %matplotlib inline
  • from matplotlib import pyplot
  • from sklearn.linear_model import LinearRegression
  • from sklearn.model_selection import train_test_split
  • from sklearn.metrics import r2_score, mean_squared_error
  • from sklearn import metrics
  • seaborn

Project files

  • listings_austin.csv.gz - dataset used for analysis
  • calendar_austin.csv.gz - dataset used for analysis
    NOTE: The calendar dataset is to large to be uploaded to github and can be downloaded from Airbnb: http://insideairbnb.com/get-the-data.html.
  • AirBnB Austin Texas.ipynb - final project
  • AirBnB Austin Texas.html - final project

Rubric

Code Functionality

CRITERIA MEETS SPECIFICATIONS
Code is readable (uses good coding practices - PEP8) Code has easy-to-follow logical structure. The code uses comments effectively and/or Notebook Markdown cells correctly. The steps of the data science process (gather, assess, clean, analyze, model, visualize) are clearly identified with comments or Markdown cells, as well. The naming for variables and functions should be according to PEP8 style guide.
Code is functional. All the project code is contained in a Jupyter notebook, which demonstrates successful execution and output of the code.

Data

CRITERIA MEETS SPECIFICATIONS
Project follows the CRISP-DM Process while analyzing their data. Project follows the CRISP-DM process outlined for questions through communication. This can be done in the README or the notebook. If a question does not require machine learning, descriptive or inferential statistics should be used to create a compelling answer to a particular question.
Proper handling of categorical and missing values in the dataset. Categorical variables are handled appropriately for machine learning models (if models are created). Missing values are also handled appropriately for both descriptive and ML techniques. Document why a particular approach was used, and why it was appropriate for a particular situation.

Analysis, Modeling, Visualization

CRITERIA MEETS SPECIFICATIONS
There are 3-5 business questions answered. There are between 3-5 questions asked, related to the business or real-world context of the data. Each question is answered with an appropriate visualization, table, or statistic.

Github Repository

CRITERIA MEETS SPECIFICATIONS
Student must publish their code in a public Github repository. Student must have a Github repository of their project. The repository must have a README.md file that communicates the libraries used, the motivation for the project, the files in the repository with a small description of each, a summary of the results of the analysis, and necessary acknowledgements. Students should not use another student's code to complete the project, but they may use other references on the web including StackOverflow and Kaggle to complete the project.

Blog Post

CRITERIA MEETS SPECIFICATIONS
Communicate their findings with stakeholders. Student must have a blog post on a platform of their own choice (can be on their website, a Medium post or Github blog post). Student must communicate their results clearly. The post should not dive into technical details or difficulties of the analysis - this should be saved for Github. The post should be understandable for non-technical people from many fields.
There should be an intriguing title and image related to the project. Student must have a title and image to draw readers to their post.
The body of the post has paragraphs that are broken up by appropriate white space and images. There are no long, ongoing blocks of text without line breaks or images for separation anywhere in the post.
Each question has a clearly communicated solution. Each question is answered with a clear visual, table, or statistic that provides how the data supports or disagrees with some hypothesis that could be formed by each question of interest.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published