# Moduły

Jedną z największych zalet Pythona jest jego modularność. Podstawowy Python zawiera tylko napotrzebniejsze funkcje, natomiast jego możliwości są znacznie rozszerzone poprzez _moduły_. Moduł to nic innego jak zbiór funkcji służących do realizacji określonych czynności. Wiele modułów dostępnych jest razem z podstawową instalacją Pythona (<https://docs.python.org/3/py-modindex.html>), jeszcze więcej zanistalowanych jest domyślnie wraz z Anacondą (<https://docs.anaconda.com/anaconda/packages/old-pkg-lists/4.3.1/py35/>). Możesz też instalować dodatkowe moduły poprzez Anacondę (<https://conda.io/docs/user-guide/tasks/manage-pkgs.html>).

## Moduł `pandas`

Jeżeli dany moduł jest zainstalowany na twoim komputerze, musisz go zaimportować zanim skorzystasz z jego funkcji. Zaimportujemy teraz moduł, który służy do analizy danych w Pythonie. Aby zaimportować moduł stosujemy słowo kluczowe `import` i nazwę modułu. Aby uniknąć konfliktów w nazwach funkcji najczęściej podajemy też krótką nazwę (alias), po której identyfikować będziemy funkcje z danego modułu. Zwyczajowo dla `pandas` jest to `pd`. Całość wygląda tak:

In [1]:
import pandas as pd

Dzięki tej linijce zaimportowane i dostępne są wszystkie funkcje odstępne w module `pandas`. Żeby z nich skorzystać, musimy ich nazwy poprzedzić prefiksem `pd.`. Skorzystamy teraz z funkcji `read_csv`, która otwiera plik `csv` znajdujący się na dysku, odczytuje go i umieszcza w czymś, co nazywa się `pandas dataframe` - strukturze służącej do przechowywania danych w formie tabel. Załadujemy plik `movies.csv` dostępny w repo przedmiotu w folderze `data/drugs`. W pliku tym znajdują się dane dotyczące spożycia substancji psychoaktywnych u ludzi w różnym wieku. Poniżej znajdziesz opis zmiennych w tym pliku (dostępny też w `README.md` w katalogu `drugs`).

# Drug Use By Age

This directory contains data behind the story [How Baby Boomers Get High](http://fivethirtyeight.com/datalab/how-baby-boomers-get-high/). It covers 13 drugs across 17 age groups.

Source: [National Survey on Drug Use and Health from the Substance Abuse and Mental Health Data Archive](http://www.icpsr.umich.edu/icpsrweb/content/SAMHDA/index.html).

Header | Definition
---|---------
`alcohol-use` | Percentage of those in an age group who used alcohol in the past 12 months
`alcohol-frequency` | Median number of times a user in an age group used alcohol in the past 12 months
`marijuana-use` | Percentage of those in an age group who used marijuana in the past 12 months
`marijuana-frequency` | Median number of times a user in an age group used marijuana in the past 12 months
`cocaine-use` | Percentage of those in an age group who used cocaine in the past 12 months
`cocaine-frequency` | Median number of times a user in an age group used cocaine in the past 12 months
`crack-use` | Percentage of those in an age group who used crack in the past 12 months
`crack-frequency` | Median number of times a user in an age group used crack in the past 12 months
`heroin-use` | Percentage of those in an age group who used heroin in the past 12 months
`heroin-frequency` | Median number of times a user in an age group used heroin in the past 12 months
`hallucinogen-use` | Percentage of those in an age group who used hallucinogens in the past 12 months
`hallucinogen-frequency` | Median number of times a user in an age group used hallucinogens in the past 12 months
`inhalant-use` | Percentage of those in an age group who used inhalants in the past 12 months
`inhalant-frequency` | Median number of times a user in an age group used inhalants in the past 12 months
`pain-releiver-use` | Percentage of those in an age group who used pain relievers in the past 12 months
`pain-releiver-frequency` | Median number of times a user in an age group used pain relievers in the past 12 months
`oxycontin-use` | Percentage of those in an age group who used oxycontin in the past 12 months
`oxycontin-frequency` | Median number of times a user in an age group used oxycontin in the past 12 months
`tranquilizer-use` | Percentage of those in an age group who used tranquilizer in the past 12 months
`tranquilizer-frequency` | Median number of times a user in an age group used tranquilizer in the past 12 months
`stimulant-use` | Percentage of those in an age group who used stimulants in the past 12 months
`stimulant-frequency` | Median number of times a user in an age group used stimulants in the past 12 months
`meth-use` | Percentage of those in an age group who used meth in the past 12 months
`meth-frequency` | Median number of times a user in an age group used meth in the past 12 months
`sedative-use` | Percentage of those in an age group who used sedatives in the past 12 months
`sedative-frequency` | Median number of times a user in an age group used sedatives in the past 12 months


## Import

In [2]:
drugs = pd.read_csv('data/drugs/drugs.csv')

## Różne sposoby pokazywania `df`

In [3]:
drugs

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
0,12,2798,3.9,3.0,1.1,4.0,0.1,5.0,0.0,,...,0.1,24.5,0.2,52.0,0.2,2.0,0.0,,0.2,13.0
1,13,2757,8.5,6.0,3.4,15.0,0.1,1.0,0.0,3.0,...,0.1,41.0,0.3,25.5,0.3,4.0,0.1,5.0,0.1,19.0
2,14,2792,18.1,5.0,8.7,24.0,0.1,5.5,0.0,,...,0.4,4.5,0.9,5.0,0.8,12.0,0.1,24.0,0.2,16.5
3,15,2956,29.2,6.0,14.5,25.0,0.5,4.0,0.1,9.5,...,0.8,3.0,2.0,4.5,1.5,6.0,0.3,10.5,0.4,30.0
4,16,3058,40.1,10.0,22.5,30.0,1.0,7.0,0.0,1.0,...,1.1,4.0,2.4,11.0,1.8,9.5,0.3,36.0,0.2,3.0
5,17,3038,49.3,13.0,28.0,36.0,2.0,5.0,0.1,21.0,...,1.4,6.0,3.5,7.0,2.8,9.0,0.6,48.0,0.5,6.5
6,18,2469,58.7,24.0,33.7,52.0,3.2,5.0,0.4,10.0,...,1.7,7.0,4.9,12.0,3.0,8.0,0.5,12.0,0.4,10.0
7,19,2223,64.6,36.0,33.4,60.0,4.1,5.5,0.5,2.0,...,1.5,7.5,4.2,4.5,3.3,6.0,0.4,105.0,0.3,6.0
8,20,2271,69.7,48.0,34.0,60.0,4.9,8.0,0.6,5.0,...,1.7,12.0,5.4,10.0,4.0,12.0,0.9,12.0,0.5,4.0
9,21,2354,83.2,52.0,33.0,52.0,4.8,5.0,0.5,17.0,...,1.3,13.5,3.9,7.0,4.1,10.0,0.6,2.0,0.3,9.0


In [4]:
drugs.head()

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
0,12,2798,3.9,3.0,1.1,4.0,0.1,5.0,0.0,,...,0.1,24.5,0.2,52.0,0.2,2.0,0.0,,0.2,13.0
1,13,2757,8.5,6.0,3.4,15.0,0.1,1.0,0.0,3.0,...,0.1,41.0,0.3,25.5,0.3,4.0,0.1,5.0,0.1,19.0
2,14,2792,18.1,5.0,8.7,24.0,0.1,5.5,0.0,,...,0.4,4.5,0.9,5.0,0.8,12.0,0.1,24.0,0.2,16.5
3,15,2956,29.2,6.0,14.5,25.0,0.5,4.0,0.1,9.5,...,0.8,3.0,2.0,4.5,1.5,6.0,0.3,10.5,0.4,30.0
4,16,3058,40.1,10.0,22.5,30.0,1.0,7.0,0.0,1.0,...,1.1,4.0,2.4,11.0,1.8,9.5,0.3,36.0,0.2,3.0


In [5]:
drugs.tail()

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
12,26-29,2628,80.7,52.0,20.8,52.0,3.2,5.0,0.4,6.0,...,1.2,13.5,4.2,10.0,2.3,7.0,0.6,30.0,0.4,4.0
13,30-34,2864,77.5,52.0,16.4,72.0,2.1,8.0,0.5,15.0,...,0.9,46.0,3.6,8.0,1.4,12.0,0.4,54.0,0.4,10.0
14,35-49,7391,75.0,52.0,10.4,48.0,1.5,15.0,0.5,48.0,...,0.3,12.0,1.9,6.0,0.6,24.0,0.2,104.0,0.3,10.0
15,50-64,3923,67.2,52.0,7.3,52.0,0.9,36.0,0.4,62.0,...,0.4,5.0,1.4,10.0,0.3,24.0,0.2,30.0,0.2,104.0
16,65+,2448,49.3,52.0,1.2,36.0,0.0,,0.0,,...,0.0,,0.2,5.0,0.0,364.0,0.0,,0.0,15.0


In [6]:
drugs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 28 columns):
age                        17 non-null object
n                          17 non-null int64
alcohol-use                17 non-null float64
alcohol-frequency          17 non-null float64
marijuana-use              17 non-null float64
marijuana-frequency        17 non-null float64
cocaine-use                17 non-null float64
cocaine-frequency          16 non-null float64
crack-use                  17 non-null float64
crack-frequency            14 non-null float64
heroin-use                 17 non-null float64
heroin-frequency           16 non-null float64
hallucinogen-use           17 non-null float64
hallucinogen-frequency     17 non-null float64
inhalant-use               17 non-null float64
inhalant-frequency         16 non-null float64
pain-releiver-use          17 non-null float64
pain-releiver-frequency    17 non-null float64
oxycontin-use              17 non-null float64
oxycont

## Wybieranie poszczególnych kolumn

In [7]:
drugs['age']

0        12
1        13
2        14
3        15
4        16
5        17
6        18
7        19
8        20
9        21
10    22-23
11    24-25
12    26-29
13    30-34
14    35-49
15    50-64
16      65+
Name: age, dtype: object

In [8]:
drugs['alcohol-frequency']

0      3.0
1      6.0
2      5.0
3      6.0
4     10.0
5     13.0
6     24.0
7     36.0
8     48.0
9     52.0
10    52.0
11    52.0
12    52.0
13    52.0
14    52.0
15    52.0
16    52.0
Name: alcohol-frequency, dtype: float64

In [9]:
drugs[['age', 'marijuana-use', 'marijuana-frequency']]

Unnamed: 0,age,marijuana-use,marijuana-frequency
0,12,1.1,4.0
1,13,3.4,15.0
2,14,8.7,24.0
3,15,14.5,25.0
4,16,22.5,30.0
5,17,28.0,36.0
6,18,33.7,52.0
7,19,33.4,60.0
8,20,34.0,60.0
9,21,33.0,52.0


## Wybieranie wierszy

In [10]:
drugs[0:5]

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
0,12,2798,3.9,3.0,1.1,4.0,0.1,5.0,0.0,,...,0.1,24.5,0.2,52.0,0.2,2.0,0.0,,0.2,13.0
1,13,2757,8.5,6.0,3.4,15.0,0.1,1.0,0.0,3.0,...,0.1,41.0,0.3,25.5,0.3,4.0,0.1,5.0,0.1,19.0
2,14,2792,18.1,5.0,8.7,24.0,0.1,5.5,0.0,,...,0.4,4.5,0.9,5.0,0.8,12.0,0.1,24.0,0.2,16.5
3,15,2956,29.2,6.0,14.5,25.0,0.5,4.0,0.1,9.5,...,0.8,3.0,2.0,4.5,1.5,6.0,0.3,10.5,0.4,30.0
4,16,3058,40.1,10.0,22.5,30.0,1.0,7.0,0.0,1.0,...,1.1,4.0,2.4,11.0,1.8,9.5,0.3,36.0,0.2,3.0


In [11]:
drugs[10:]

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
10,22-23,4707,84.2,52.0,28.4,52.0,4.5,5.0,0.5,5.0,...,1.7,17.5,4.4,12.0,3.6,10.0,0.6,46.0,0.2,52.0
11,24-25,4591,83.1,52.0,24.9,60.0,4.0,6.0,0.5,6.0,...,1.3,20.0,4.3,10.0,2.6,10.0,0.7,21.0,0.2,17.5
12,26-29,2628,80.7,52.0,20.8,52.0,3.2,5.0,0.4,6.0,...,1.2,13.5,4.2,10.0,2.3,7.0,0.6,30.0,0.4,4.0
13,30-34,2864,77.5,52.0,16.4,72.0,2.1,8.0,0.5,15.0,...,0.9,46.0,3.6,8.0,1.4,12.0,0.4,54.0,0.4,10.0
14,35-49,7391,75.0,52.0,10.4,48.0,1.5,15.0,0.5,48.0,...,0.3,12.0,1.9,6.0,0.6,24.0,0.2,104.0,0.3,10.0
15,50-64,3923,67.2,52.0,7.3,52.0,0.9,36.0,0.4,62.0,...,0.4,5.0,1.4,10.0,0.3,24.0,0.2,30.0,0.2,104.0
16,65+,2448,49.3,52.0,1.2,36.0,0.0,,0.0,,...,0.0,,0.2,5.0,0.0,364.0,0.0,,0.0,15.0


In [12]:
drugs[-3:]

Unnamed: 0,age,n,alcohol-use,alcohol-frequency,marijuana-use,marijuana-frequency,cocaine-use,cocaine-frequency,crack-use,crack-frequency,...,oxycontin-use,oxycontin-frequency,tranquilizer-use,tranquilizer-frequency,stimulant-use,stimulant-frequency,meth-use,meth-frequency,sedative-use,sedative-frequency
14,35-49,7391,75.0,52.0,10.4,48.0,1.5,15.0,0.5,48.0,...,0.3,12.0,1.9,6.0,0.6,24.0,0.2,104.0,0.3,10.0
15,50-64,3923,67.2,52.0,7.3,52.0,0.9,36.0,0.4,62.0,...,0.4,5.0,1.4,10.0,0.3,24.0,0.2,30.0,0.2,104.0
16,65+,2448,49.3,52.0,1.2,36.0,0.0,,0.0,,...,0.0,,0.2,5.0,0.0,364.0,0.0,,0.0,15.0


## `loc`

Lokalizowanie po etykiecie. Wprowadź dwie listy rozdzielone przecinkiem. Pierwsze rzędy, drugie kolumny.

In [13]:
drugs.loc[6:8, ['age', 'cocaine-use']]

Unnamed: 0,age,cocaine-use
6,18,3.2
7,19,4.1
8,20,4.9


In [14]:
drugs.loc[8, :]

age                          20
n                          2271
alcohol-use                69.7
alcohol-frequency            48
marijuana-use                34
marijuana-frequency          60
cocaine-use                 4.9
cocaine-frequency             8
crack-use                   0.6
crack-frequency               5
heroin-use                  0.9
heroin-frequency             45
hallucinogen-use            7.4
hallucinogen-frequency        2
inhalant-use                1.5
inhalant-frequency            4
pain-releiver-use            10
pain-releiver-frequency      10
oxycontin-use               1.7
oxycontin-frequency          12
tranquilizer-use            5.4
tranquilizer-frequency       10
stimulant-use                 4
stimulant-frequency          12
meth-use                    0.9
meth-frequency               12
sedative-use                0.5
sedative-frequency            4
Name: 8, dtype: object

## `iloc`

Lokalizowanie po indeksach. Tak samo jak poprzednio, wprowadź dwie listy rozdzielone przecinkiem. Pierwsze rzędy, drugie kolumny.

In [19]:
drugs.iloc[:5, 6:10]

Unnamed: 0,cocaine-use,cocaine-frequency,crack-use,crack-frequency
0,0.1,5.0,0.0,
1,0.1,1.0,0.0,3.0
2,0.1,5.5,0.0,
3,0.5,4.0,0.1,9.5
4,1.0,7.0,0.0,1.0


In [23]:
drugs.iloc[[1, 3, 8], 2::2]

Unnamed: 0,alcohol-use,marijuana-use,cocaine-use,crack-use,heroin-use,hallucinogen-use,inhalant-use,pain-releiver-use,oxycontin-use,tranquilizer-use,stimulant-use,meth-use,sedative-use
1,8.5,3.4,0.1,0.0,0.0,0.6,2.5,2.4,0.1,0.3,0.3,0.1,0.1
3,29.2,14.5,0.5,0.1,0.2,2.1,2.5,5.5,0.8,2.0,1.5,0.3,0.4
8,69.7,34.0,4.9,0.6,0.9,7.4,1.5,10.0,1.7,5.4,4.0,0.9,0.5


## Proste statystyki

In [15]:
drugs['n'].sum()

55268

In [16]:
drugs['marijuana-use'].max()

34.0

In [24]:
drugs.iloc[drugs['marijuana-frequency'].idxmax(), :]

age                          20
n                          2271
alcohol-use                69.7
alcohol-frequency            48
marijuana-use                34
marijuana-frequency          60
cocaine-use                 4.9
cocaine-frequency             8
crack-use                   0.6
crack-frequency               5
heroin-use                  0.9
heroin-frequency             45
hallucinogen-use            7.4
hallucinogen-frequency        2
inhalant-use                1.5
inhalant-frequency            4
pain-releiver-use            10
pain-releiver-frequency      10
oxycontin-use               1.7
oxycontin-frequency          12
tranquilizer-use            5.4
tranquilizer-frequency       10
stimulant-use                 4
stimulant-frequency          12
meth-use                    0.9
meth-frequency               12
sedative-use                0.5
sedative-frequency            4
Name: 8, dtype: object

In [26]:
drugs.iloc[drugs['alcohol-frequency'].idxmax(), :]

age                          21
n                          2354
alcohol-use                83.2
alcohol-frequency            52
marijuana-use                33
marijuana-frequency          52
cocaine-use                 4.8
cocaine-frequency             5
crack-use                   0.5
crack-frequency              17
heroin-use                  0.6
heroin-frequency             30
hallucinogen-use            6.3
hallucinogen-frequency        4
inhalant-use                1.4
inhalant-frequency            2
pain-releiver-use             9
pain-releiver-frequency      15
oxycontin-use               1.3
oxycontin-frequency        13.5
tranquilizer-use            3.9
tranquilizer-frequency        7
stimulant-use               4.1
stimulant-frequency          10
meth-use                    0.6
meth-frequency                2
sedative-use                0.3
sedative-frequency            9
Name: 9, dtype: object