In [None]:
import numpy as np
from random import random

Pandas uses NumPy array library under the hood, NumPy provides a set of classes and types for ultra-fast work with vectors of any dimensions. It uses C heavily and provides lots of optimisations for speed and memory economy. Let's compare the speed with cPython

In [None]:
%timeit sum([random() for _ in range(1000000)])

In [None]:
%timeit np.sum(np.random.rand(1000000))

Two basic data types of Pandas are Series and DataFrame. Series is a single-dimension data storage, it's similar to traditional dictionary, as it has both values and index

In [None]:
from pandas import Series, DataFrame

In [None]:
prime_series = Series([2, 3, 5, 7, 11, 13, 17])
prime_series

In [None]:
prime_series.index

In [None]:
prime_series.values

In [None]:
prime_series.name = "Primes"
prime_series.index.name = "Order"
prime_series

In [None]:
prime_series2 = Series([2, 3, 5, 7, 11, 13, 17], index=["first", "second", "third", "fourth", "fifth", "sixth", "seventh"])
prime_series2

Series supports lots of fancy indexing facilities, besides retrieving values, you can update them 

In [None]:
prime_series2[["second", "fourth", "fifth"]]

In [None]:
prime_series2[["first", "seventh", "fifth"]] = 0
prime_series2

Also we can filter values, using Bool series as an indexer

In [None]:
prime_series2 != 0

In [None]:
prime_series2[prime_series2 != 0]

In [None]:
prime_series2[(prime_series2!=0) & (prime_series2<10)]

Basic operations are applied to every elemnet

In [None]:
prime_series2[prime_series2 != 0] / 3

In [None]:
prime_series2.apply(lambda x: x * 2)
prime_series2

In [None]:
prime_series2[prime_series2 != 0].apply(lambda x: 3/x)

DataFrame is a two-dimensional data storage, roughly speaking we can treat it as an array of Seires, sharing same index. It has lots of additional functions and ways of initialisation

In [None]:
raw_data = {    
    "Bar": (42.0912, 19.0899),
    "Ulcinj": (41.9311, 19.2148),
    "Petrovac": (42.2053, 18.9458),
    "Budva": (42.2911, 18.8403),
    "Tivat": (42.4350, 18.7066),
    "Kotor": (42.4247, 18.7712),
    "Herceg Novi": (42.4572, 18.5315),
    "Podgorica": (42.4304, 19.2594),
    "Kolasin": (42.8205, 19.5241),
    "Cetinje": (42.3931, 18.9116),
    "Niksic": (42.7805, 18.9562),
    "Zabljak": (43.1555, 19.1226),
    "Danilovgrad": (42.5538, 19.1077),
    "Pljevlja": (43.3582, 19.3513),
    "Bijelo Polje": (43.0369, 19.7562),
}

transformed = {"name": [], "latitude": [], "longitude": []}

for city, (lat, lng) in raw_data.items():
    transformed["name"].append(city)
    transformed["latitude"].append(lat)
    transformed["longitude"].append(lng)
    
monty_cities = DataFrame(transformed)
monty_cities

In [None]:
monty_cities.columns

In [None]:
monty_cities.index

We can manipulate with index

In [None]:
monty_cities.set_index('name', drop=True, inplace=True, verify_integrity=True)  # only drop is True by default
monty_cities

In [None]:
monty_cities.latitude  # same as monty_cities["latitude"]

By default, indexer returns data columns, to access rows, exists built-in promerties loc and iloc

In [None]:
monty_cities.loc[["Bar", "Kolasin", "Pljevlja"]]

In [None]:
monty_cities.iloc[[2, 3, 5, 7], [1]]

Almost anything you can think of — will work with seires

In [None]:
monty_cities.iloc[prime_series2[prime_series2>0]]

In [None]:
monty_cities.loc['Kotor':'Cetinje']

In [None]:
monty_cities['dist'] = np.sqrt(monty_cities.latitude ** 2 + monty_cities.longitude ** 2)
monty_cities

In [None]:
monty_cities.drop('dist', axis='columns', inplace=True)
monty_cities