# Quality data science deliverables

Later today you will be given your **first project assignment**. Many questions will be answered then. 

But it is safe to assume that a **presentation** and a **jupyter notebook** will be involved.
I'm sure we've all seen.... *questionable* presentations.

![gif-powerpoint](https://media1.giphy.com/media/94iS62lx8CRQA/giphy.gif?cid=790b76115ccb171b6b7a426359453df9&rid=giphy.gif)

## Learning goals:

- Assess a few presentation decks and generate a list of best practices
- Practice formating text in markdown
- Identify the differences between jupyter notebooks in different stages of analysis
- Identify your growth area in presentations and generate a list of steps


## Goal 1: Presentations and decks

For reading on your own time:

[Article 1](https://24slides.com/presentbetter/seven-worst-presentations-time-went-wrong/), [Article 2](https://blog.lemonadestand.org/bad-powerpoint-presentations/), [Article 3](https://elearningindustry.com/creating-powerpoint-presentations-5-mistakes-avoid), [Article 4](https://www.makeuseof.com/tag/powerpoint-presentation-mistakes/)

### Content vs Style vs Format

**Content**:
- Does your deck tell a story
- Is your story in three parts?
- What is your text?
- What are your titles?

**Style**:

![style guide](http://www.creditlenders.info/wp-content/uploads/style-guide-examples-style-guidelines-examples-the-make-shop-inspiration-pinterest.jpg)

**Format**:
- Grammar
- Whitespace
- Transitions

### Task:

Review these two decks: [V1](https://docs.google.com/presentation/d/1Vvnya28jwm7EBWDImB-easLgUZuW3Uj_gJSpTeITYnw/edit?usp=sharing) and [V2](https://docs.google.com/presentation/d/1i2_9VSOeX78UK6EGrUNnI9j4b7PPt_hia_8z1DWYju0/edit?usp=sharing)
- Which is "better"?
- What could be improved in each?

## Goal 2: Practice formatting text in markdown

***

Which of these two is easier to read?

[Doc 1](https://github.com/learn-co-students/dc_ds_06_03_19/blob/b59f944911dd153543f8ef44e004e2f9b595b9ed/module_1/week_2/week_5_git_groups_quality_presentations/markdown-test/no-markdown/README.md) vs [Doc 2](https://github.com/learn-co-students/dc_ds_06_03_19/blob/b59f944911dd153543f8ef44e004e2f9b595b9ed/module_1/week_2/week_5_git_groups_quality_presentations/markdown-test/README.md)


### Task: update Doc 1 to match the style of Doc 2

[markdown cheat sheet](https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf)

----
# Predicting tomorrow's rainfall in Australia

## Executive summary

The goal of this analysis is to predict whether it will rain tomorrow in a given city in Australia, based on today's weather. We analyzed a Kaggle dataset of (daily weather data <- bold 'daily weather data' and delete everything else in the parentheses!) collected in 49 Australian cities between 2010 and 2018. The final model had an 83.9% accuracy rate in predicting whether or not it would rain tomorrow vs. a baseline model accuracy of 77.8%. Italicize the sentence before this one and delete this sentence!

![Rainfall Map](rainfall_map.png)

## Contents

* [Introduction](#Introduction) 
    * [Problem statement](#Problem-statement) 
    * [Dataset](#Dataset) 
* [Analysis](#Analysis) 
    * [Data cleaning](#Data-cleaning) 
    * [Exploratory data analysis](#Exploratory-data-analysis)
    * [Modeling](#Modeling)
    * [Metrics](#Metrics)
* [Next steps](#Next-steps)

## Introduction

### Problem statement

Our goal is to predict whether or not it will rain tomorrow, in a given city in Australia, based on today's weather, using replicable machine learning techniques. We used two separate modeling techniques (Decision Tree and Random Forest) to create these forecasts.

### Dataset

The analysis is based on this Kaggle dataset https://www.kaggle.com/jsphyg/weather-dataset-rattle-package of daily weather data collected in 49 Australian cities between 2010 and 2018. The dataset consists of 22 numerical and categorical features, and 1 target classification variable.

## Analysis

### Data cleaning

We had duplicate data, missing values, and impossible values that we had to rectify before proceeding to modeling.

### Exploratory data analysis

We found some signficant correlations in our data that we had to explore further.

### Modeling

Our analysis used both Decision Tree and Random Forest classifier models.

We can create multiline Python code with syntax highlighting:

```python
def model(): 
    for kind in models: start = time.time() 
        if kind == 'Tree': 
            model = tree.DecisionTreeClassifier() 
        else : model = ensemble.RandomForestClassifier()
```

### Metrics

We create a confusion matrix to calculate classification metrics using the sklearn.metrics.confusion_matrix() method.

## Next Steps

![finalimg](https://i.imgur.com/3fkDIms.jpg)

----

## Goal 3: Identify the differences between jupyter notebooks in different stages of analysis

When you write a paper or a report, what version do you share with your boss or your audience? Does it look different than when you started?

### Task: order notebooks
- We have three notebooks
  - V1_Australia_rain.ipynb
  - V2_Australia_rain.ipynb
  - V3_Australia_rain.ipynb
- Note their differences and similarities
- Please review them and rank them from lowest to highest
  - 1 being just for your eyes
  - 3 being for a wider audience


## Integration:
Integration will happen and be demonstrated during your presentations and over the next few weeks. This is a growth curve for most. We want you to have a *quality* presentation.

![baby](https://media2.giphy.com/media/l0HeqCXn1qjs8dqKs/giphy.gif?cid=790b76115ccb226549494475368e0b72&rid=giphy.gif)

## Reflection:

- What's the one area you see in yourself that will need the most focus, where you will be experiencing the most growth?
