dc_ds_06_03_19/module_1/morning_warm_up/week_2/2_5_apply_lambda_gb_plt.md at master · learn-co-students/dc_ds_06_03_19 · GitHub

1. Go to https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016 and download the data

2. Use command line to rename the downloaded file to suicide_rates.csv, then move the file to your repository

use 'mv'

3. Load the data into a pandas dataframe

4. Pick only rows where country starts with 'U'

use apply and lambda

5. Get the names of unique countries left - `use .unique()`

[Thoughts]: how would that be useful if we have duplicated values in a dataset?

6. Pick only rows with 'United States' (without using apply or lambda - try using indexing)

6. Group by age column and sum # suicides

7. Change the age column to a categorical one

use astype

8. Re-order the categories so that 5-14 years appears first

use cat.reorder_categories

9. Draw a barplot of # suicides(y) per age-group (x) for 2015

10. Save the plot to the visualization folder we have created

11. Push all to your git branch.