## Lab- Data Structures and Python with Pokemon
Building "Pokemon Stay"
You are an analyst at a "scrappy" online gaming company that specializes in remakes of last year's fads.

Your boss, who runs the product development team, is convinced that Pokemon Go's fatal flaw was that you had to actually move around outside. She has design mock-ups for a new game called Pokemon Stay: in this version players still need to move, but just from website to website. Pokemon gyms are now popular online destinations, and catching Pokemon in the "wild" simply requires browsing the internet for hours in the comfort of your home.

She wants you to program a prototype version of the game, and analyze the planned content to help the team calibrate the design.

In [1]:
from IPython.display import display

## 1. Defining a player

Each player needs to have a set of charactaristics, stored in variables, such as an id, a username, play data, etc. A great structure to house these variables is a dictionary, because the values can contain any python datatype includeing list, dict, tuple, int,  float, bool, or str.

The player variables are:

player_id : id code unique to each player (integer)
player_name : entered name of the player (string)
time_played : number of time played the game in minutes (float)
player_pokemon: the player's captured pokemon (dictionary)
gyms_visited: ids of the gyms that a player has visited (list)

## A) Create a dict for a single player.
The player_id should be 1
Since the player doesn't have a name yet, you may set the player_name equal to None
The rest of the fields should be populated properly depending on the datatype.

In [2]:

# dictionary for a single player
player_1 = {
    'player_id' : 1,
    'player_name' : None,
    'time_played' : 0.0,
    'player_pokemon': {},
    'gyms_visited': [],
}

## B) Create a dict to house your dataset of players.

Because only player_1 exists, there should only be one key:value pair.

The keys of this dict should be the player_id, and the values should be the dictionaries with 
    single-player info, including the player_id (slightly redundant).
Use the display function imported above to display poke_players.

In [3]:

# putting an already made player into the poke players dict# putti 
# this creates a dictionary(player_1) nested in another dictionary (poke_players)
# we utilized a value in the player_1 dictionary to create its own key for the poke_players dict
poke_players = {
    player_1['player_id']:player_1
}

display(poke_players)

{1: {'gyms_visited': [],
  'player_id': 1,
  'player_name': None,
  'player_pokemon': {},
  'time_played': 0.0}}

## C) Update player 1's info with your own.

By indexing your poke_players dictionary, update the player_name field to your own name.
display your poke_players dict to check your work.

In [4]:
# accessing a nested dictionary requires us to go layer by layer
# `poke_players[1] == player_1`
# `player_1['player_name] = 'Mike`
# if `poke_players[1]` IS `player_1` we can substitute to get `poke_players[1]['player_name'] = 'Mike'`

poke_players[1]['player_name'] = 'Mike'
poke_players

{1: {'gyms_visited': [],
  'player_id': 1,
  'player_name': 'Mike',
  'player_pokemon': {},
  'time_played': 0.0}}

## D) Define a function that adds a player to poke_players.
Your functions should...

Take arguments for players_dict, player_id, and player_name.
Create a player with the above values and populate the gyms_visited, player_pokemon, and time_played accordingly.
Returns the name of the player added.
Add a second player to the players dictionary. The id should be 2, but the name is up to you!
Display your poke_players to check your work.

In [5]:
# function takes 3 arguments: the players_dict where the new player will be added
# `player_id` and `player_name` are used create the new_player
def add_player(players_dict, player_id, player_name):
    new_player = {
        'player_id': player_id,
        'player_name': player_name,
        'player_pokemon': {}, # player_pokemon, gyms_visited and time_played have default values
        'gyms_visited': [],
        'time_played': 0.0
    }
    players_dict[new_player['player_id']] = new_player # once again using a value in the existing dict to create said 
                                            # dicts key in the players_dict
    print("{} added!".format(new_player['player_name']))

In [6]:
add_player(poke_players, 2, 'Joshua')

Joshua added!


In [7]:
poke_players

{1: {'gyms_visited': [],
  'player_id': 1,
  'player_name': 'Mike',
  'player_pokemon': {},
  'time_played': 0.0},
 2: {'gyms_visited': [],
  'player_id': 2,
  'player_name': 'Joshua',
  'player_pokemon': {},
  'time_played': 0.0}}

## 2. Defining "gym" locations
As the sole programmer, Pokemon Stay will have to start small. To begin, there will be 10 different gym location websites on the internet. The gym locations are:

1. 'reddit.com'
2. 'amazon.com'
3. 'twitter.com'
4. 'linkedin.com'
5. 'ebay.com'
6. 'netflix.com'
7. 'stackoverflow.com'
8. 'github.com'
9. 'quora.com'
10. 'google.com'

Set up a list of all the gym locations. This will be a list of strings. Print the list to check your work.
For each player in poke_players, use sample (imported from random below) to randomly select 2 gyms and add these gyms to the gyms_visited field.
Display the poke_players dict to check your work.

In [8]:
from random import sample

In [9]:
# Run this cell a few times to understand sample. Play around with the function!
this_list = ['apple', 1, ('a','b','c'), 0.8]
sample(this_list, 3)

[('a', 'b', 'c'), 0.8, 'apple']

In [10]:
gyms  =  [
         'reddit.com' ,
    'amazon.com',
    'twitter.com',
    'linkedin.com',
    'ebay.com',
    'netflix.com',
    'stackoverflow.com',
    'github.com',
    'quora.com',
    'google.com',
]

In [11]:
for player_id in poke_players.keys():
    poke_players[player_id]['gyms_visited'].extend(sample(gyms,2)) 
    # extend allows us to add a list to the end of another

In [12]:
# the above code was ran twice so each player visisted 4 gyms
poke_players

{1: {'gyms_visited': ['google.com', 'stackoverflow.com'],
  'player_id': 1,
  'player_name': 'Mike',
  'player_pokemon': {},
  'time_played': 0.0},
 2: {'gyms_visited': ['github.com', 'reddit.com'],
  'player_id': 2,
  'player_name': 'Joshua',
  'player_pokemon': {},
  'time_played': 0.0}}

## 3. Create a pokedex
We also need to create some pokemon to catch! Let's store the attributes of each pokemon in a dictionary, since each pokemon has many charactaristics we'd like to store.

Each pokemon will be defined by these variables:

poke_id : unique identifier for each pokemon (integer, sequential)
poke_name : the name of the pokemon (string)
poke_type : the category of pokemon (string)
hp : base hitpoints (integer between 400 and 500)
attack : base attack (integer between 50 and 100)
defense : base defense (integer between 50 and 100)
special_attack : base special attack (integer between 100 and 150)
special_defense : base sepecial defense (integer between 100 and 150)
speed : base speed (integer between 0 and 100)

## A) Create a function called create_pokemon
The function should take arguments for poke_id, poke_name, and poke_type.
Use np.random.randint to generate values for the numeric attributes based on the conditions above. If you're not clear on how this function works, there is a cell below with an example. Play around with it!
The function should return a dict for the pokemon.
Without assigning it to a variable, check the function's output by calling it with the following arguments:
poke_id = 1
poke_name = 'charmander'
poke_type = 'fire'

In [15]:
import numpy as np

In [16]:
# Play around with this cell to understand np.random.randint!

np.random.randint(0,10)

1

In [18]:
#a function that returns a dictionary
def create_pokemon(poke_id, poke_name, poke_type):
    return {
        'poke_id': poke_id,
        'poke_name': poke_name,
        'poke_type': poke_type,
        'hp': np.random.randint(400,500), 
        'attack': np.random.randint(50,100),
        'defense': np.random.randint(50,100),
        'special_attack': np.random.randint(100,150),
        'special_defense': np.random.randint(100,150),
        'speed': np.random.randint(0,100)
    }

In [19]:
create_pokemon(3, 'bulbasaur', 'grass')

{'attack': 93,
 'defense': 56,
 'hp': 469,
 'poke_id': 3,
 'poke_name': 'bulbasaur',
 'poke_type': 'grass',
 'special_attack': 102,
 'special_defense': 122,
 'speed': 92}

## B) Populate the pokedex!
Now we need some pokemon to catch. Let's create a dictionary to store the information!

Instantiate an empyt dictionary called pokedex.
Define a function called create_and_add_to_pokedex. This function should...
Take arguments for pokedex,  poke_id, poke_name, and poke_type.
Uses the create_pokemon function you created earlier to create a pokemon using the provided poke_id, poke_name, and poke_type.

Add a new key:value pair to the pokedex dictionary where:
the key is the poke_id, and
the value is the newly-created pokemon dict, including the poke_id (this is slightly redundant, but that's ok!)
Prints the name of the pokemon added to the pokedex using str.format()
Add the following 3 pokemon to your pokedex using create_and_add_to_pokedex:

Id	Name	Type
1	charmander	fire
2	squirtle	water
3	bulasaur	poison

Display your pokedex to check your work. It should look something like...

{1: {'attack': 64,
  'defense': 59,
  'hp': 495,
  'poke_id': 1,
  'poke_name': 'charmander',
  'poke_type': 'fire',
  'special_attack': 100,
  ...

In [20]:

# empty pokedex to add pokemon to.  # empty  
pokedex = {}

In [21]:
# use that last function we created inside of this function to create pokemon and add to the pokedex
def create_and_add_to_pokedex(pokedex, poke_id, poke_name, poke_type, ):
    poke = create_pokemon(poke_id, poke_name, poke_type)
    pokedex[poke['poke_id']] = poke
    return print("{} added to pokedex".format(poke['poke_name']))


In [22]:
pokedex

{}

In [23]:
# create some pokemone for the dex
create_and_add_to_pokedex(pokedex, 1, 'charmander', 'fire')
create_and_add_to_pokedex(pokedex, 2, 'squirtle', 'water')
create_and_add_to_pokedex(pokedex, 3, 'bulbasaur', 'poison')

charmander added to pokedex
squirtle added to pokedex
bulbasaur added to pokedex


In [24]:
pokedex

{1: {'attack': 71,
  'defense': 65,
  'hp': 410,
  'poke_id': 1,
  'poke_name': 'charmander',
  'poke_type': 'fire',
  'special_attack': 105,
  'special_defense': 109,
  'speed': 96},
 2: {'attack': 72,
  'defense': 70,
  'hp': 410,
  'poke_id': 2,
  'poke_name': 'squirtle',
  'poke_type': 'water',
  'special_attack': 146,
  'special_defense': 148,
  'speed': 72},
 3: {'attack': 74,
  'defense': 65,
  'hp': 454,
  'poke_id': 3,
  'poke_name': 'bulbasaur',
  'poke_type': 'poison',
  'special_attack': 144,
  'special_defense': 103,
  'speed': 94}}

## 4. Let's capture some pokemon!
The 'player_pokemon' keyed dictionaries for each player keep track of which of the pokemon each player has.

The keys of the 'player_pokemon' dictionaries are the pokemon ids that correspond to the ids in the pokedex dictionary you created earlier, and the values are the individual pokemon dicts. Just like your pokedex, but for each player individually!

Define a function called add_pokemon_to_player that...
Takes arguents for player_id, poke_id, player_dict, and pokedex.
You may set the default player_dict to poke_players and the default pokedex to the external variable pokedex
Adds the desired pokemon to the player_pokemon field of the specified player
Prints which pokemon was added to which player.
Use your function to add squirtle to player 1, and add charmander and bulbasaur to player 2
Display your poke_players to check your work.

In [25]:
# takes arguments player_id and poke_id and adds said pokemon to said players captured pokemon
# there are two arguments that have default values, player_dict and pokedex,
# default arguments means that that does not need to be expliticly stated as there is a default value for it.

def add_pokemon_to_player(player_id, poke_id, player_dict = poke_players, pokedex = pokedex):
    player_dict[player_id]['player_pokemon'][poke_id] = pokedex[poke_id]
    print("{} added to {}'s player_pokemon!".format(pokedex[poke_id]['poke_name'], player_dict[player_id]['player_name']))

In [26]:
# add some pokemon to some players
add_pokemon_to_player(1, 2)
add_pokemon_to_player(2,1)
add_pokemon_to_player(2,3)

squirtle added to Mike's player_pokemon!
charmander added to Joshua's player_pokemon!
bulbasaur added to Joshua's player_pokemon!


In [27]:
# players can have only 1 of each pokemon (Bulbasaur, Squirtle, Charmander)
# all players can have a Bulbasaur or a Squirtle or a Charmander
add_pokemon_to_player(1,3)

bulbasaur added to Mike's player_pokemon!


## 5. What gyms have players visited?


## A) Checking gyms
Write a for-loop that:

Iterates through the pokemon_gyms list of gym locations you defined before.
For each gym, iterate through each player in the players dictionary with a second, internal for-loop.
If the player has visited the gym, print out "[player] has visited [gym location].", filling in [player] and [gym location] with the current player's name and current gym location.

In [28]:
for gym in gyms: # look at each gym
    for player_id, player_dict in poke_players.items(): # look at all the players and their gyms
        if gym in player_dict['gyms_visited']: # if the current gym is one a player has visited
            print("{} has visited {}".format(player_dict['player_name'], gym)) # print that they have visited it.

Joshua has visited reddit.com
Mike has visited stackoverflow.com
Joshua has visited github.com
Mike has visited google.com


## B) Computational Complexity
How many times did that loop run? If you have N gyms and also M players, how many times would it run as a function of N and M?

(You can write your answer as Markdown text.)

$N \text{ gyms x } M \text{ players } = NxM$

## 6. Calculate player "power".
Define a function that will calculate a player's "power". Player power is defined as the sum of the base statistics all of their pokemon.

$$
\text{player power } = \sum_{i = 1}^{n}\text{attack}_i + \text{defense}_i + \text{special attack}_i + \text{special defense}_i
$$
Where $i$ is an individual pokemon in a player's player_pokemon. ($\sum$ just means sum, so you're just adding up all the attributes listed above for all the pokemon in the player's player_pokemon).

Your function should:

Accept a poke_players dictionary and a player_id as arguments.
For the specified player_id, look up that player's pokemon.
Find and aggregate the attack and defense values for each of the player's pokemon.
Print "[player name]'s power is [player power].", where the player power is the sum of the base statistics for all of their pokemon.
Return the player's power value.
Check your work by displaying pokemon power for each of your players.

In [34]:
# # 1 mantatory argument of the player ID
# # take that ID, go through all the pokemon said ID has
# # add up all the relevant stats and return that total value.
def get_power(player_id, player_dict = poke_players):
    power = 0.0 # base power to increment 
    attrs = ['attack','defense','special_attack','special_defense'] # list of attributes that we want 
    for poke_id, poke_dict in player_dict[player_id]['player_pokemon'].items(): # for each pokemon
        for attr in attrs: # grad the attribute from the pokemon
            power += poke_dict[attr] # add it to the power stat
    print("{}'s power is {}".format(player_dict[player_id]['player_name'], power))
    return power

In [49]:
poke_players

{1: {'gyms_visited': ['google.com', 'stackoverflow.com'],
  'player_id': 1,
  'player_name': 'Mike',
  'player_pokemon': {2: {'attack': 72,
    'defense': 70,
    'hp': 410,
    'poke_id': 2,
    'poke_name': 'squirtle',
    'poke_type': 'water',
    'special_attack': 146,
    'special_defense': 148,
    'speed': 72},
   3: {'attack': 74,
    'defense': 65,
    'hp': 454,
    'poke_id': 3,
    'poke_name': 'bulbasaur',
    'poke_type': 'poison',
    'special_attack': 144,
    'special_defense': 103,
    'speed': 94}},
  'time_played': 0.0},
 2: {'gyms_visited': ['github.com', 'reddit.com'],
  'player_id': 2,
  'player_name': 'Joshua',
  'player_pokemon': {1: {'attack': 71,
    'defense': 65,
    'hp': 410,
    'poke_id': 1,
    'poke_name': 'charmander',
    'poke_type': 'fire',
    'special_attack': 105,
    'special_defense': 109,
    'speed': 96},
   3: {'attack': 74,
    'defense': 65,
    'hp': 454,
    'poke_id': 3,
    'poke_name': 'bulbasaur',
    'poke_type': 'poison',
    'sp

In [45]:
def get_power(player_id, player_dict = poke_players, poke_dict = 'player_pokemon'):
    power = 0.0 # base power to increment 
    attrs = ['attack','defense','special_attack','special_defense'] # list of attributes that we want 
    for poke_id, poke_dict in player_dict[player_id]['player_pokemon'].items(): # for each pokemon
        for attr in attrs: # grad the attribute from the pokemon
            power += attr # add it to the power stat
    print("{}'s power is {}".format(player_dict[player_id]['player_name'], power))
    return power

In [47]:
# player_dict

In [48]:
# because values were randomized, yours will be different.  
for player_id in poke_players.keys():
    print(get_power(player_id))

TypeError: unsupported operand type(s) for +=: 'float' and 'str'

## 7. Load a pokedex file containing all the pokemon
Load data using the with open() method.
While you were putting together the prototype code, your colleagues were preparing a dataset of Pokemon and their attributes (This was a rush job, so they may have picked some crazy values for some...). Your task is to load the data into a list of lists so you can manipulate it.

The type of the data should be a list
The type of each element in that list should be a list
The type of each element in the sub-list should be str or float.
The code provided loads the data into one looooong str. To get it into the correct format:

Use your_string.replace() to remove ", where your_string is any object of type str.
Use your_string.split() to create a new row for each line. New lines are denoted with a '\n'.
Iterate through your data. Use try/except to cast numeric data as type float.
Your end result is effectively a matrix. Each list $i$ in the outer list is a row, and the $j$th elements of list together form the jth column, which represents a data attribute. The first three lists in your pokedex list should look like this:

['PokedexNumber', 'Name', 'Type', 'Total', 'HP', 'Attack', 'Defense', 'SpecialAttack', 'SpecialDefense', 'Speed']
[1.0, 'Bulbasaur', 'GrassPoison', 318.0, 45.0, 49.0, 49.0, 65.0, 65.0, 45.0]
[2.0, 'Ivysaur', 'GrassPoison', 405.0, 60.0, 62.0, 63.0, 80.0, 80.0, 60.0]

WARNING: Don't print or display your entire new pokedex! Viewing that many entries will clog up your notebook and make it difficult to read.

In [None]:
# Code to read in pokedex info
raw_pd = ''
pokedex_file = 'pokedex_basic.csv'
with open(pokedex_file, 'r') as f:
    raw_pd = f.read()
    
# the pokedex string is assigned to the raw_pd variable
# the file is read in as a single string where newlines `\n` divide records

In [None]:
# cleans and splits the single string into appropriate records
newlines = raw_pd.replace('"','').split('\n')
# each record is now its own string
newlines[:3]

In [None]:
# splits record strings up into lists
new_pd = []
for line in newlines:
    new_pd.append(line.split(','))
    
new_pd[:2]

In [None]:
# converts numeric values from strings to floats
for i, line in enumerate(new_pd): # i == each record/row
    for j, item in enumerate(line): # j == individual value in each record.
        try:
            new_pd[i][j] = float(item) # 
        except:
            pass

In [None]:
new_pd[:3]

## 8. Changing Types
A) Convert your data into a dictionary.
Your dict should...

have keys of the new pokedex as the PokedexNumber
have values containing data for each pokemon in a dictionary form, just like our pokedex from before
Keep in mind, the keys here are a little bit different than the original pokedex.
Be careful of the header, you do not want to include that as a pokemon.
WARNING: Don't display your entire pokedex when turning this in! Viewing that many entries will clog up your notebook and make it difficult to read. If youd like to visualize your pokedex, index with a few of its keys.
Your new_pd_dict should be organized like...

{1.0: {'Attack': 49.0,
  'Defense': 49.0,
  'HP': 45.0,
  'Name': 'Bulbasaur',
  'PokedexNumber': 1.0,
  'SpecialAttack': 65.0,
  'SpecialDefense': 65.0,
  'Speed': 45.0,
  'Total': 318.0,
  'Type': 'GrassPoison'},
 2.0: {'Attack': 62.0,
  'Defense': 63.0,
  'HP': 60.0,
  'Name': 'Ivysaur',

In [None]:
header = new_pd[0] # extracting the column headers
body = new_pd[1:] # all the pokemon values

In [None]:
new_pd_dict = {} # empty dict toa append to.

for row in body:
    poke = {i:j for i,j in zip(header, row)} # goes through each row and uses the header to create keys.
    new_pd_dict[poke['PokedexNumber']] = poke

## B) Orient your new_pd_dict by columns.
Your new pokedex is oriented by index, meaning that each entry is a row value. Your goal in this exercise is to orient the pokedex dict by columns, meaning:

The keys of the dictionary are the column names
The values of the dictionary are a column vector of that feature.
HINT: Read documentation on defaultdict (from collections import defaultdict), this may help!
BONUS: Do this with list and/or dictionary comprehensions only

In [None]:
from collections import defaultdict
### The below three blocks are 3 solutions to the same problem.

In [None]:
# Base solution
new_pd_dict_columns = {}

# pretty much have values that contain all of a type of item
# new_pd_dict_columns['poke_name'] = [Every pokemons name]

# creating place holders
for col_name in header:
    new_pd_dict_columns[col_name] = []

for poke_id, single_poke_dict in new_pd_dict.items(): # go through every individual pokemons dict
    for field, attribute in single_poke_dict.items(): # go through all the values in the dict
        new_pd_dict_columns[field].append(attribute)

In [None]:
### Using defaultdict
new_pd_dict_columns = defaultdict(list)

for poke_id, single_poke_dict in new_pd_dict.items():
    for field, attribute in single_poke_dict.items():
        new_pd_dict_columns[field].append(attribute)

In [None]:

# Using list/dict comprehensions# Using  
new_pd_dict_columns = {
    col_name:[single_poke_dict[col_name] for poke_id, single_poke_dict in new_pd_dict.items()] for col_name in header
}

## 9. Write a function to filter your pokedex!
Your goal in this exercise is to search your pokedex based on your own defined criteria! Build a function that...

Takes arguments of:
a pokedex dict (can be either the row or column oriented dict, pick the one of your choice!)
a filter_options dict (described below)
For parameters in your filter_dict, your function should return:
pokemon that are >= (greater than or equal to) the value you passed in your filter_dict for that field for continuous values
pokemon of that name or type for string values (equal)
Return a list of the individual pokemon dictionaries that meet your search criteia!

In [None]:
from collections import Counter

In [None]:
# Example where orient is rows, not columns
def filter_pokemon(pokedex, filter_dict):
    results = [] # results of query
    num_filters = len(list(filter_dict.keys())) # number of items looking for in our filter
    for poke_id, poke_attr_dict in pokedex.items():
        counter = 0 # used to count individual matches associate with our filter
        for field, value in filter_dict.items(): # 
            if type(value) is str: # if the filter value is a string
                if value == poke_attr_dict[field]: # check for exact match
                    counter += 1 # if true +1 to matched filter fields
            else: # if filter value is a float
                if poke_attr_dict[field] >= value: # check for equal to or greater values
                    counter += 1 # if true +1 to matched filter fields
        if counter == num_filters: # if we have an equal amount of passable fields to filter fields
            results.append(poke_attr_dict) # append the pokemon
    return results

In [None]:
filter_dict = {'SpecialAttack':50,
               'Type':'GrassPoison',
               'Defense':80}
filter_pokemon(new_pd_dict, filter_dict)

## 10. Descriptive statistics on the prototype pokedex


A) What is the population mean and standard deviation of the "Total" attribute for all characters in the Pokedex?¶

In [None]:
import numpy as np

In [None]:

# mean# mean
 this_meanthis_mea  = np.mean(new_pd_dict_columns['Total'])
this_mean

In [None]:
# standard deviation
this_std = np.std(new_pd_dict_columns['Total'])
this_std

## B) Outlier detection part 1
The game is no fun if the pokemon are wildly unbalanced! Are any pokemon "overpowered", which we'll define as having a "Total" more than 2.5 standard deviations from the population mean?

In [None]:
# a list comprehension across many lines pretty much the same as
# for poke_id, ind_poke_dict in new_pd_dict.items() :
#     if ind_poke_dict['Total'] > 2.5*this_std + this_mean
#          overpowered.append(ind_poke_dict)

overpowered = [ind_poke_dict 
               for poke_id, ind_poke_dict in new_pd_dict.items() 
               if ind_poke_dict['Total'] > 2.5*this_std + this_mean]

overpowered

## C) Outlier detection part 2
Tukey's method for outline detection states that anything more than 1.5 * the interquartile range above or below the median is an outlier. Find outliers using this method!

In [None]:
first_quartile = np.percentile(new_pd_dict_columns['Total'], 25)
first_quartile

In [None]:
third_quartile = np.percentile(new_pd_dict_columns['Total'], 75)
third_quartile

In [None]:
iqr = third_quartile - first_quartile
iqr

In [None]:

lower_fencelower_fe  = first_quartile - iqr
upper_fence = third_quartile + iqr
display(lower_fence) # that iPython Display we imported earlier
display(upper_fence)

In [None]:
# same logic as the overpowered question, except our bounds are 1.5 standard deviations instead of 2.5
# AND we are checking both directions (High and Low).  
outliers = [ind_poke_dict 
               for poke_id, ind_poke_dict in new_pd_dict.items() 
               if ind_poke_dict['Total'] < lower_fence or ind_poke_dict['Total'] > upper_fence]

outliers

## 11. Distributions, Sampling, and Confidence Intervals
Now that you've loaded your data and identified outliers, you'd like to understand your data as a whole. Use the 1.3 lesson as a guide to complete the following challenges.

A) Plot histograms for each of the numeric values.
There are 7 numeric features (columns):

numeric_columns = ['Attack','Defense','HP','SpecialAttack','SpecialDefense','Speed','Total']
Using matplotlib.pyplot subplots, create a figure that:

displays a histogram of each feature
Use the column name as the title of each subplot

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
numeric_columns = ['Attack','Defense','HP','SpecialAttack','SpecialDefense','Speed','Total']

In [None]:
fig = plt.figure(figsize=(16,8))
for i, col in enumerate(numeric_columns): # for index and column name 
    fig.add_subplot(2,4,1+i) # create subplots, use index to increment 
    col_data = new_pd_dict_columns[col]
    plt.hist(col_data, bins=20)
    plt.title(col)

## B) Are any features normally distributed? What is the skew of each column?
Use scipy.stats.normaltest and scipy.stats.skew to find if each feature is normally distributed and to find if each distribution is skewed positive or negative.

In [None]:
from scipy.stats import normaltest, skew

In [None]:
for col in numeric_columns:
    norm_pvalue = normaltest(new_pd_dict_columns[col]).pvalue # do a normal test, return p value
    col_skew = skew(new_pd_dict_columns[col]) # do the skew test
    if norm_pvalue < .05: 
        print('{} is not normally distributed since the pvalue {} is less than .05. The skew is {}\n'
              .format(col, round(norm_pvalue,2), round(col_skew,2)))
    else:
        print('{} is normally distributed since the pvalue {} is greater than .05. The skew is {}\n'
              .format(col, round(norm_pvalue,2), round(col_skew,2)))

## C) Find the 90% confidence interval for the mean of each of the numeric columns
Like we did in the 1.3 lesson, create functions to sample your data and generate a confidence interval for the mean.
Use your functions to determine the 90% confidence interval for the mean of each column.
What is the interpretation of your confidence interval?

In [None]:
def get_sample_means(distribution, sample_size = 200, num_samples = 200, replace = False): # randomly sample
    sample_means = []
    for _ in range(num_samples):
        this_mean = np.random.choice(distribution, size=sample_size, replace=replace).mean()
        sample_means.append(this_mean)
    
    return sample_means

In [None]:
def confidence_interval_mean(distribution, confidence, sample_size = 200, num_samples = 200):
    
    '''
    distribution: a distribution for which you want the confidence interval
    confidence: how confident you want to be
    returns: lower and upper bound of the confidence interval.
    interpretation: you are <confidence * 100> % sure the actual (population) mean lies between the lower and upper bound.
    i.e. if your confidence interval is 90%, the mean lies within this range 90% of the time.
    '''
    
    means = get_sample_means(distribution, sample_size = sample_size, num_samples = num_samples)
    
    dist_from_0_or_100 = (100-confidence)/2
    lower_percentile, upper_percentile = 0+dist_from_0_or_100, 100-dist_from_0_or_100

    return (np.percentile(means, lower_percentile), np.percentile(means, upper_percentile))

In [None]:
for col in numeric_columns:
    lower, upper = confidence_interval_mean(new_pd_dict_columns[col], 90)
    print('We can be 90% confident that the population mean for {} lies between {} and {}\n'
          .format(col, round(lower,2), round(upper,2)))