# Using Python to interact with LUSID
This tutorial is designed to be a starting point for writing simple Python scripts in the Jupyter environment to interact with LUSID programmatically. The tutorial will cover:
- Using Jupyter
- Basic Python scripting
- Using Numpy to efficiently work with large, multi-dimensional data
- Using Pandas to work with dataframes
- Using the Python LUSID SDKs

## Basic Python scripting
This section of the tutorial serves as a brief introduction to Python and writing some Python code. We'll cover:
- What is Python
- Assigning variables and built-in types
- Python Sequence types
- Python Mapping types
- Decision statements
- Loops
- Functions
- Classes and objects

### What is Python?
Guido Van Rossum began developing Python in 1989 as a programming language that should be:
- An easy and intuitive language just as powerful as major competitors
- Open source, so anyone can contribute to its development
- Code that is as understandable as plain English
- Suitability for everyday tasks, allowing for short development times

Python is interpreted, which means that there is no complex build process for running your code, so you can write your code interactively. In Jupyter, you can write your code in a code cell and execute it by clicking the run button - its as simple as that!

### Assigning variables and built-in types
Here we'll learn how to create some variables, and some of the built-in types that Python provides
#### Assigning variables
Python is dynamically typed, so you don't have to declare the type of a variable before using it. By [convention](https://peps.python.org/pep-0008/#function-and-variable-names), multi-word variables are lower case, and seperated by underscores.

In [1]:
# This is a comment, it is ignored by the interpreter

# Assigning hello world to the variable x
# in other languages you would have to declare that x is a string before doing this
x = 'hello world'
x

'hello world'

In [2]:
# Assigning 7 to the variable y
# in other languages you would have to declare that y is an integer before doing this
y = 7
y

7

In [3]:
# In Python types are dynamic
y = 'not just a number'
y

'not just a number'

#### Built-in types
Here we'll introduce some of the built-in types that are provided out-of-the-box with Python:
- Numeric types - int, float
- The Text Sequence type - str
- Boolean Values
- The Null Object - None

##### Numeric types - int, float
In Python, integers are zero, positive or negative whole numbers without a fractional part and having unlimited precision.\
Floats are made to represent floating-point numbers with the same precision as the double type in other common languages.
You can use common mathematical operations with both floats and integers

In [4]:
# Some integers
a = 1
b = 2000
c = 0
d = -99

# Some floats
e = 0.1
f = -25.972

In [5]:
# adding ints
a + b

2001

In [6]:
# adding floats
e + f
# as f is negative its subtracted from e, as you would expect

-25.872

In [7]:
# some more complex math
2*(a+b)/e

40020.0

In [8]:
f = a + b
f + 1

2002

##### The Text Sequence type - str
Strings represent text data. In Python strings are immutable sequences of unicode characters:

In [9]:
# Assigning the string hello world to the variable x
x = 'hello world'
# Also assigning the string hello world to the variable x
x = "hello world"

single_quotes ='allows embedded "double" quotes'

double_quotes = "allows embedded 'single' quotes"

# There are also multi-line strings:
y = '''So
many
lines!
'''
print(y)

So
many
lines!



Strings have many useful functions that can be used to manipulate them, here are some examples:

In [10]:
x = 'hello world'
# Return a copy of the string with its first character capitalized and the rest lowercased.
print(x.capitalize())
# Return a copy of the string with all the cased characters converted to uppercase
print(x.upper())
# Return True if all cased characters in the string are lowercase and there is at least one cased character, False otherwise.
print(x.islower())

Hello world
HELLO WORLD
True


f-strings are also pretty useful, allowing us to interpolate values into our strings:

In [11]:
name = 'Cage'
print(f'Thay call him {name}')

Thay call him Cage


##### Boolean Values
Python boolean values are either True or False (capitalized)

In [12]:
# assigning true to a variable
booleans_are_capitalized = True
booleans_are_capitalized

True

##### The Null object - None
None is a special object returned by functions (explained later) that don't explicitly return a value.

In [13]:
x = None
print(x)

None


### Python sequence types
Here we'll describe some of the Python types used to store collections of data:
- Lists
- Tuples
- Sets
- Range

#### Lists
Lists are the easiest way to store collections of values. Lists can accept heterogenous values, are variable in length, and are mutable.

In [14]:
# declaring a list
sample_list = [1,2,"three", True]
sample_list

[1, 2, 'three', True]

You can access elements in most Python sequences by slicing:

In [15]:
sample_list = [1,2,"three", True]
# Get the value at index 0
print(sample_list[0])
# Get the values from index 0 to index 2
print(sample_list[0:3])
# Get the value at the last index
print(sample_list[-1])
# Get all values in reverse order
print(sample_list[::-1])

1
[1, 2, 'three']
True
[True, 'three', 2, 1]


#### Tuples
Tuples are similar to lists, but they are immutable. Tuples can also be unpacked into several variables in a single line.

In [16]:
x = (1,2,True)
print(x[1])

# If a tuple has only one element it must have a comma
y = (1,)
print(y)

# Cannot change element in tuple as its immutable
try:
    x[1] = 7
except TypeError as exception:
    print(exception)
    
# tuple unpacking
tpl = (1,2,3)
a,b,c = tpl
print(a)
print(b)
print(c)

2
(1,)
'tuple' object does not support item assignment
1
2
3


#### Sets
Sets are also similar to lists, they just won't contain any duplicate values:

In [17]:
# Creating a set with duplicate 2 values:
x = {1,2,2,3}
print(x)

{1, 2, 3}


#### Range
Range is a sequence type that represents an immutable sequence of numbers. Its normally used to generate numbers used in for loops:

In [18]:
# from 0 to 9
print(list(range(10)))
# from 9 to 0
print(list(range(9, -1, -1)))
# even numbers to 10
print(list(range(2, 10, 2)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[2, 4, 6, 8]


### Python Mapping Types
dict is Python's built-in mapping type. Dicts store key-value pairs where keys must be [hashable](https://docs.python.org/3/glossary.html#term-hashable), and values can be any type. It is usually much easier, and more efficient to save and fetch values from dictionaries than from Sequence types.

In [81]:
dictionary = {'banana':True, 7:'hello_world'}
print(dictionary)

print(dictionary['banana'])
dictionary['apple'] = False
print(dictionary)

{'banana': True, 7: 'hello_world'}
True
{'banana': True, 7: 'hello_world', 'apple': False}


In [82]:


# access values by key
print(dictionary['apple'])

# Can also use get to grab values
print(dictionary.get('apple'))

# and provide a default value if key doesn't exist.
print(dictionary.get('appel', 'sensible default value'))

False
False
sensible default value


### Decision statements
Python allows you to control the flow of the program - the simplist way by using 'if' statements

In [20]:
# try changing the value of expression to see what this code does!
expression = True
# if expression evaluates to True then print hello world
# otherwise do nothing
if expression:
    print('hello world')

hello world


In [21]:
# try changing the value of expression to see what this code does!
value = 1
# if expression evaluates to True then print value is 0
# otherwise continue to elif statement
if value == 0:
    print('value is 0')
# if the value is 1
# print value is 1
elif value == 1:
    print('value is 1')


value is 1


In [22]:
# try changing the value of expression to see what this code does!
value = 'banana'
# if expression evaluates to True then print value is 0
# otherwise continue to elif statement
if value == 0:
    print('value is 0')
# if the value is 1
# print value is 1
# otherwise continue to next elif statement or else statement if no more elifs.
elif value == 1:
    print('value is 1')
else:
    print('banana')


banana


### Loops

Python has 2 main types of loop:
 - For loops.\
For loops work by by iterating over a sequence of values. Its common to combine the range sequence type with a for loop, but any sequence type can be used:

In [23]:
# print the numbers 1 - 9
for i in range(10):
    print(i)
    
animals = ['cow', 'sheep', 'alligator']
for animal in animals:
    print(animal)

# tuple unpacking is commonly used in for loops:
# here enumerate returns a sequence of tuples
# where each tuple contains an animal and its index in the list
for index, animal in enumerate(animals):
    print(index)
    print(animal)

habitats = ['farms', 'fields', 'swamps']
# zip can be used to loop over two lists simultaneously:
for animal, habitat in zip(animals, habitats):
    print(animal)
    print(habitat)
    

0
1
2
3
4
5
6
7
8
9
cow
sheep
alligator
0
cow
1
sheep
2
alligator
cow
farms
sheep
fields
alligator
swamps


- While loops \
While loops check an expression every iteration, and only break out of the loop when the expression evaluates to false.

In [24]:
i = 0
# print the numbers 1 - 10 then stop 
while(i<10):
    print('running')
    i+=1
    print(i)
print('stopped')

running
1
running
2
running
3
running
4
running
5
running
6
running
7
running
8
running
9
running
10
stopped


Common to both for loops and while loops are the _break_ and _continue_ keywords.
_break_ causes the program to exit the loop.
_continue_ causes the current iteration of the loop to end, and the next to begin.

In [83]:
# print 0 - 3 then exit loop
for i in range(10):
    print(i)
    if i == 3:
        break

0
1
2
3


In [84]:
      
# print 0 - 9 , but skip the number 3
for i in range(10):
    if i==3:
        continue
    print(i)

0
1
2
4
5
6
7
8
9


### Functions
Functions are a great way of re-using code.

In Python, you can declare "arguments", which are variables which the function uses. Functions will also return a value, using the return statement, which allows you to use the result of some function call later in your code.
If you don't provide a return statement, the function returns None.

In [26]:
# a fairly useless function that just prints something to the terminal
def print_flux_capacitor():
    print('flux capacitor')
    
# call the function
print_flux_capacitor()

# print the sum of 2 numbers:
def print_sum(a, b):
    print(a+b)

# prints 3
print_sum(1,2)
# pass a predefined variable to a function
x = 7
# prints 8
print_sum(1, x)

# return the sum of two numbers
def return_sum(a, b):
    return a + b
    
# store result of function in a variable
result = return_sum(1,2)
print(result)

flux capacitor
3
8
3


The special syntax *args in function definitions in python is used to pass a variable number of arguments to a function. It is used to pass a variable-length argument list. 

The special syntax \*\*kwargs in function definitions in python is used to pass a keyworded argument list.

In [95]:
# prints args and kwargs
def fancy_func(*args, **kwargs):
    print(f'args: {args}')
    print(f'kwargs: {kwargs}')
    
fancy_func('x', 1, 2, fruit = 'apple', a = 'b')

args: ('x', 1, 2)
kwargs: {'fruit': 'apple', 'a': 'b'}


### Classes and objects
Generally, in larger peices of code, we model things as "objects" which use "attributes" to hold the state of the object, and behaviours(functions) that the objects express.

Classes define what attributes and behaviours an object can have. For example, we might define a Person class, which says that people have heights, eye colours and can walk. An object in this example would be my friend Shawn, who is 180cm tall, has blonde hair and will walk to the local cafe every so often.

Let's create a person class, and some people objects:

In [27]:
# defining our class
class Person:
    # __init__ function is a constructor in python - it's how we initialise our Person object and set initial attribute values
    # All object functions will include a self parameter, which allows use to access attributes and behaviours of our object
    def __init__(self, height_in_cm, hair_colour):
        self.height_in_cm = height_in_cm
        self.hair_colour = hair_colour
        # we'll also set a current location attribute that defaults to home
        self.location = 'home'
    # lets say our people can walk to a location
    def walk(self, location):
        self.location = location
    # We'll also add a function that tells Python how to print our object in a human readable way
    # this is called a dunder function, which is outside of the scope of this course.
    def __repr__(self):
        return f'Person(height_in_cm:{self.height_in_cm}, hair_colour:{self.hair_colour}, location:{self.location})'

# Let's model my friend Shawn:
shawn = Person(180, 'blonde')

# let's see what shawn looks like:
print(shawn)
# we'll send shawn to grab a coffee - he should move to the cafe:
shawn.walk('cafe')
print(shawn)

# we can also access attributes individually 
print(shawn.location)

# or create someone else:
dean = Person(195, 'brunette')
print(f'Dean is a {dean}')

Person(height_in_cm:180, hair_colour:blonde, location:home)
Person(height_in_cm:180, hair_colour:blonde, location:cafe)
cafe
Dean is a Person(height_in_cm:195, hair_colour:brunette, location:home)


## Using NumPy to efficiently work with large, multi-dimensional data
This part of the course will serve as a short introduction to NumPy, a widely used Python module which leverages C code to efficiently process multi-dimensional arrays of data. You don't need to learn C to write efficient code, NumPy takes care of this without any effort from the programmer!

This part of the course will cover:
 - What are NumPy arrays
 - Creating NumPy arrays
 - Basic vectorized operations
 - Indexing, slicing and iterating

### What are NumPy arrays?
While a Python list can contain different data types within a single list, in order to improve efficience, all of the elements in a NumPy array should be homogeneous.

An array is a grid of values and it contains information about the raw data and how to locate an element. The elements are all of the same type, referred to as the array dtype. Note, this will not be one of the built-in Python types.
 
The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension.

### Creating numpy arrays

Let's create some numpy arrays and explore their structure:


In [28]:
# lets print the arrays we create, along with array metadata
def describe_np(aray):
    # tuple of integers giving size of the array across each dimension
    print(f'shape: {aray.shape}')
    # rank
    print(f'rank: {aray.ndim}')
    # type of data
    print(f'dtype: {aray.dtype.name}')
    # size of each entry in array
    print(f'itemsize: {aray.itemsize}')
    # number of elements in array
    print(f'size: {aray.size}')
    print('-'*20)
    print(f'a:{aray}')
    print('-'*20)
    

In [29]:
import numpy as np

# create a numpy array with numbers from 0 to 15
# of rank 2
# with the first dimension of size 3,
# and second dimension of size 5

a = np.arange(15).reshape(3, 5)
describe_np(a)



shape: (3, 5)
rank: 2
dtype: int64
itemsize: 8
size: 15
--------------------
a:[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
--------------------


In [30]:
# create np array from Python array
a = np.array([2, 3, 4])
describe_np(a)

shape: (3,)
rank: 1
dtype: int64
itemsize: 8
size: 3
--------------------
a:[2 3 4]
--------------------


In [31]:
b = np.array([1.2, 3.5, 5.1])
# notice dtype is float64
describe_np(b)

shape: (3,)
rank: 1
dtype: float64
itemsize: 8
size: 3
--------------------
a:[1.2 3.5 5.1]
--------------------


In [32]:
# numpy will auto convert sequences of sequences to 2 dim array
# this also applies with more nested sequences in higher dimenstions
c = np.array([(1.5, 2, 3), (4, 5, 6)])
describe_np(c)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[1.5 2.  3. ]
 [4.  5.  6. ]]
--------------------


In [33]:
# we can define the type when we create an array
d = np.array([1, 2], dtype=np.float64)
describe_np(d)

shape: (2,)
rank: 1
dtype: float64
itemsize: 8
size: 2
--------------------
a:[1. 2.]
--------------------


In [34]:
# we can use zeros to create an array filled with zeros
# we have to pass the shape of the array to the zeros fn
e = np.zeros((2,3))
describe_np(e)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[0. 0. 0.]
 [0. 0. 0.]]
--------------------


In [35]:
# ones does the same
f = np.ones((2,3))
describe_np(f)

shape: (2, 3)
rank: 2
dtype: float64
itemsize: 8
size: 6
--------------------
a:[[1. 1. 1.]
 [1. 1. 1.]]
--------------------


In [36]:
# we can use arange to create a sequence of integers - similar to range in Python
g = np.arange(4)
describe_np(g)

shape: (4,)
rank: 1
dtype: int64
itemsize: 8
size: 4
--------------------
a:[0 1 2 3]
--------------------


In [37]:
# use linspace to do the same for floating point sequences
h = np.linspace(0, 2, 9) # 9 numbers from 0 to 2
describe_np(h)

shape: (9,)
rank: 1
dtype: float64
itemsize: 8
size: 9
--------------------
a:[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
--------------------


### Basic vectorized operations
Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [38]:
a = np.array([20, 30, 40, 50])
b = np.arange(4) #[0, 1, 2, 3]
# returns a new array
# with values:
# [
# 20 - 0
# 30 - 1
# 40 - 2
# 50 - 3
#]
c = a - b
describe_np(c)
print('')
describe_np(b**2)
print('')
describe_np(a < 35)

shape: (4,)
rank: 1
dtype: int64
itemsize: 8
size: 4
--------------------
a:[20 29 38 47]
--------------------

shape: (4,)
rank: 1
dtype: int64
itemsize: 8
size: 4
--------------------
a:[0 1 4 9]
--------------------

shape: (4,)
rank: 1
dtype: bool
itemsize: 1
size: 4
--------------------
a:[ True  True False False]
--------------------


Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class.

In [39]:
a = np.array([20, 30, 40, 50])
print(a.sum())
print(a.min())
print(a.max())

140
20
50


In [40]:
# sum along one dimension
b = np.ones((2,3))
# sum all values across first dimension
describe_np(b.sum(axis = 0))

shape: (3,)
rank: 1
dtype: float64
itemsize: 8
size: 3
--------------------
a:[2. 2. 2.]
--------------------


### Indexing, slicing and iterating
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [41]:
a = np.arange(10)
describe_np(a[2])
print('')
describe_np(a[2:5])

shape: ()
rank: 0
dtype: int64
itemsize: 8
size: 1
--------------------
a:2
--------------------

shape: (3,)
rank: 1
dtype: int64
itemsize: 8
size: 3
--------------------
a:[2 3 4]
--------------------


Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [42]:
b = np.array([[1,2,3],[4,5,6]])
describe_np(b)
print('')
describe_np(b[0,0])
print('')
# all values in column 1
describe_np(b[:, 1])

shape: (2, 3)
rank: 2
dtype: int64
itemsize: 8
size: 6
--------------------
a:[[1 2 3]
 [4 5 6]]
--------------------

shape: ()
rank: 0
dtype: int64
itemsize: 8
size: 1
--------------------
a:1
--------------------

shape: (2,)
rank: 1
dtype: int64
itemsize: 8
size: 2
--------------------
a:[2 5]
--------------------


Iterating over multidimensional arrays is done with respect to the first axis:



In [43]:
b = np.array([[1,2,3],[4,5,6]])
for index, row in enumerate(b):
    print(f'row {index}: {row}')

row 0: [1 2 3]
row 1: [4 5 6]


or we can use the flat attribute to iterate over all elements in an array

In [44]:
b = np.array([[1,2,3],[4,5,6]])
for index, element in enumerate(b.flat):
    print(f'element {index}: {element}')

element 0: 1
element 1: 2
element 2: 3
element 3: 4
element 4: 5
element 5: 6


## Using Pandas to work with dataframes

We'll explore the widely used pandas library in this part of the course. We'll cover:
- Intro to pandas
- Structure of a dataframe
- Creating dataframes
- Viewing dataframes
- Slicing dataframes
- Joining dataframes

### Intro to pandas
Pandas aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language.

Most of the functionality of pandas is delivered in Dataframes and Series'.

### Structure of a dataframe
A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.

Rows are labelled with an index, which can be of any type - by default this is a sequential set of integers.

In [45]:
import pandas as pd
df = pd.DataFrame([{'name':'Nickols', 'height in cm':123, 'hair colour': 'red'},
                   {'name':'Benjals', 'height in cm':200, 'hair colour': 'black'},
                   {'name':'Dennisons', 'height in cm':180, 'hair colour': 'brunette'}])
df.head()

Unnamed: 0,name,height in cm,hair colour
0,Nickols,123,red
1,Benjals,200,black
2,Dennisons,180,brunette


Every column in a pandas dataframe is a Series object:

In [46]:
print(df['name'])
print(type(df['name']))

0      Nickols
1      Benjals
2    Dennisons
Name: name, dtype: object
<class 'pandas.core.series.Series'>


Here the name column is a Series with 3 values.

You can create a Series from scratch as well

In [47]:
ages = pd.Series([23, 45, 37])
print(ages)

0    23
1    45
2    37
dtype: int64


### Creating dataframes

Ww can create a DataFrame from iterables of iterables (like a list of dictionaries), or NumPy arrays:

In [48]:
data = [[i + 3*j for i in range(3)]for j in range(4)]
df = pd.DataFrame(data)
df.head()

Unnamed: 0,0,1,2
0,0,1,2
1,3,4,5
2,6,7,8
3,9,10,11


pandas automatically names the columns and indexes if we don't provide them.

In [49]:
data = [[i + 3*j for i in range(3)]for j in range(4)]
df = pd.DataFrame(data, index = ['Lisa', 'Bart', 'Maggie', 'Nelson'], columns = ['Age','Salary','Grade'])
df.head()

Unnamed: 0,Age,Salary,Grade
Lisa,0,1,2
Bart,3,4,5
Maggie,6,7,8
Nelson,9,10,11


We've provided our own indices and columns for this data

We can do the same using a 2D numpy array

In [50]:
df = pd.DataFrame(np.random.randn(2, 3), columns = ['sin', 'cos', 'tan'])
df.head()

Unnamed: 0,sin,cos,tan
0,0.871247,-0.245586,-0.39392
1,0.936011,1.903372,0.294804


We've created a dataframe with random floats and our own column names, with default indices

We can also create dataframes from csv and Excel files.

In [51]:
pd.read_csv('_data/simpsons.csv')

Unnamed: 0,Character,Profession,Age,Alignment
0,Bart,Student,8,Chaotic good
1,Homer,Nuclear technician,47,Neutral good
2,Lisa,Student,9,Lawful good
3,Nelson,Student,10,Chaotic evil
4,Mongomery,Billionare,80,Lawful evil
5,Moe,Bartender,65,Neutral neutral
6,Barney,Unemployed,35,Chaotic neutral


Pandas has loaded our csv into a dataframe and inferred the column names.

In [52]:
pd.read_excel('_data/Futurama.xlsx')

Unnamed: 0,Character,Profession,Age
0,Fry,Delivery person,27
1,Bender,Robot,timeless
2,Farnsworth,Professor,300


Pandas has automatically opened our excel file inTo a dataframe and inferred the column names

### Viewing dataframes
Use DataFrame.head() and DataFrame.tail() to view the top and bottom rows of the frame respectively:

In [53]:
df = pd.read_csv('_data/simpsons.csv')
df.head()

Unnamed: 0,Character,Profession,Age,Alignment
0,Bart,Student,8,Chaotic good
1,Homer,Nuclear technician,47,Neutral good
2,Lisa,Student,9,Lawful good
3,Nelson,Student,10,Chaotic evil
4,Mongomery,Billionare,80,Lawful evil


In [54]:
df.tail(3)

Unnamed: 0,Character,Profession,Age,Alignment
4,Mongomery,Billionare,80,Lawful evil
5,Moe,Bartender,65,Neutral neutral
6,Barney,Unemployed,35,Chaotic neutral


Display the DataFrame.index or DataFrame.columns:

In [55]:
print(df.index)
print(df.columns)

RangeIndex(start=0, stop=7, step=1)
Index(['Character', 'Profession', 'Age', 'Alignment'], dtype='object')


describe() shows a quick statistic summary of your data:

In [56]:
df.describe()

Unnamed: 0,Age
count,7.0
mean,36.285714
std,29.118804
min,8.0
25%,9.5
50%,35.0
75%,56.0
max,80.0


DataFrame.sort_values() sorts by values:

In [57]:
df.sort_values(by='Age')

Unnamed: 0,Character,Profession,Age,Alignment
0,Bart,Student,8,Chaotic good
2,Lisa,Student,9,Lawful good
3,Nelson,Student,10,Chaotic evil
6,Barney,Unemployed,35,Chaotic neutral
1,Homer,Nuclear technician,47,Neutral good
5,Moe,Bartender,65,Neutral neutral
4,Mongomery,Billionare,80,Lawful evil


### Slicing dataframes

We can slize a dataframe using square brackets to return a Series:

In [58]:
df['Age']

0     8
1    47
2     9
3    10
4    80
5    65
6    35
Name: Age, dtype: int64

And slice a series to return a value

In [59]:
age_series = df['Age']
age_series[0]

8

You can pass a list of columns to [] to select columns in that order. 

In [60]:
df[['Character','Age']]

Unnamed: 0,Character,Age
0,Bart,8
1,Homer,47
2,Lisa,9
3,Nelson,10
4,Mongomery,80
5,Moe,65
6,Barney,35


Use loc to access a subset of rows in the dataframe - loc slicing is similar to array slicing.

In [61]:
df.loc[1]

Character                  Homer
Profession    Nuclear technician
Age                           47
Alignment           Neutral good
Name: 1, dtype: object

In [62]:
df.loc[1:3, 'Profession':'Alignment']

Unnamed: 0,Profession,Age,Alignment
1,Nuclear technician,47,Neutral good
2,Student,9,Lawful good
3,Student,10,Chaotic evil


### Joining dataframes


The concat function does all the heavy lifting of concatenating similar dataframes together along an axis:

In [63]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
    index=[0, 1, 2, 3],
)


df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    },
    index=[4, 5, 6, 7],
)


df3 = pd.DataFrame(
    {
        "A": ["A8", "A9", "A10", "A11"],
        "B": ["B8", "B9", "B10", "B11"],
        "C": ["C8", "C9", "C10", "C11"],
        "D": ["D8", "D9", "D10", "D11"],
    },
    index=[8, 9, 10, 11],
)

pd.concat([df1, df2, df3], join='outer')

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


Dataframes concatenated along axis 0

When gluing together multiple DataFrames, you have a choice of how to handle the other axes (other than the one being concatenated). This can be done in the following two ways:

Take the union of them all, join='outer'. This is the default option as it results in zero information loss.

Take the intersection, join='inner'.

In [64]:
df4 = pd.DataFrame(
    {
        "E": ["B2", "B3", "B6", "B7"],
        "F": ["D2", "D3", "D6", "D7"],
        "G": ["F2", "F3", "F6", "F7"],
    },
    index=[2, 3, 6, 7],
)


pd.concat([df1, df4], axis=1, join = 'outer')

Unnamed: 0,A,B,C,D,E,F,G
0,A0,B0,C0,D0,,,
1,A1,B1,C1,D1,,,
2,A2,B2,C2,D2,B2,D2,F2
3,A3,B3,C3,D3,B3,D3,F3
6,,,,,B6,D6,F6
7,,,,,B7,D7,F7


In [65]:
df4 = pd.DataFrame(
    {
        "D": ["B2", "B3", "B6", "B7"],
        "E": ["D2", "D3", "D6", "D7"],
        "F": ["F2", "F3", "F6", "F7"],
    },
    index=[2, 3, 6, 7],
)


pd.concat([df1, df4], axis=1, join = 'inner')

Unnamed: 0,A,B,C,D,D.1,E,F
2,A2,B2,C2,D2,B2,D2,F2
3,A3,B3,C3,D3,B3,D3,F3


Concat with join type inner only returns rows with matching indices.

We can also join or merge two pandas DataFrames:

pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame or named Series objects - let's create a couple of objects to merge:

In [66]:
left = pd.DataFrame({"Character": ['Bart', 'Marge'], "Id": [1, 2]})
left

Unnamed: 0,Character,Id
0,Bart,1
1,Marge,2


In [67]:
right = pd.DataFrame({"Id": [1, 5, 6], "Profession": ['Student', 'Police Officer', 'Doctor']})
right

Unnamed: 0,Id,Profession
0,1,Student
1,5,Police Officer
2,6,Doctor


In [68]:
pd.merge(left, right, on="Id", how="outer")

Unnamed: 0,Character,Id,Profession
0,Bart,1,Student
1,Marge,2,
2,,5,Police Officer
3,,6,Doctor


Here we join our two tables in an outer join on the Id key in both tables, so all rows are returned, with missing values set to null.

In [69]:
pd.merge(left, right, on="Id", how="inner")

Unnamed: 0,Character,Id,Profession
0,Bart,1,Student


Here we perform an inner join, so we'll only get rows that have a matching Id in both tables

In [70]:
pd.merge(left, right, on="Id", how="left")

Unnamed: 0,Character,Id,Profession
0,Bart,1,Student
1,Marge,2,


Here we perform a left join, so all rows in the left table are returned, along with only the matching rows in the right table.

In [71]:
pd.merge(left, right, on="Id", how="right")

Unnamed: 0,Character,Id,Profession
0,Bart,1,Student
1,,5,Police Officer
2,,6,Doctor


Conversely, in a right join, all rows in the right table are returned, but only the matching info on the left table is returned.

## Using the Python LUSID SDKs

We provide many tools for interacting with LUSID in Python and Jupyter.

The majority of the LUSID SDKs are automatically generated using the OpenAPI Generator. These SDKs provide a set of objects and functions which enable you to call our APIs and interact with LUSID, without having to write any REST communication code. We also provide other manually-written packages which make interacting with LUSID programmatically easier. 
In this section we'll:
- Introduce the OpenApi SDKs, describing how they are used.
- Describe some of the tools for interacting with LUSID in our hosted Jupyter environment.

### The OpenApi SDKs
We have a set of SDKs that are auto-generated using the OpenAPI Generator project, such as the lusid-sdk-python, drive-sdk-python and finbourne-access-sdk-python, all hosted on Pypi. These SDKs provide api objects, with methods that can be used to perform actions in LUSID.

All authenticated calls to the LUSID API require an OpenID Connect ID token which is issued from your token issuer url. The details of these can be found on your LUSID portal under "Applications" within the "Identity and Access Management" section.

In [72]:
import lusid
import os
from lusid.utilities import RefreshingToken
from lusid.utilities import ApiConfigurationLoader

try:
    config = ApiConfigurationLoader.load()
    RefreshingToken(config)

    api_factory = lusid.utilities.ApiClientFactory(
        token = RefreshingToken(config)
    )

    print([api for api in dir(lusid.api) if "Api" in api])
except Exception as e:
    print(e)



quote_from_bytes() expected bytes


Here's an example using the lusid python sdk. We initialise an api factory using our access details, which can be provided in either a secrets json file, or as environment variables. Here we have some environment variables set, we use RefreshingToken to grab a token using these environment variables.

We then print a list of the available APIs in the lusid preview sdk.



In [75]:
try:
    api_instance = api_factory.build(lusid.api.AggregationApi)
    scope = 'scope_example' # str | The scope of the portfolio
    code = 'code_example' # str | The code of the portfolio
    create_recipe_request = {"recipeCreationMarketDataScopes":["MyScope"],"recipeId":{"scope":"MyScope","code":"default"},"asAt":"2018-03-05T00:00:00.0000000+00:00","effectiveAt":"2018-03-05T00:00:00.0000000+00:00"} # CreateRecipeRequest | The request specifying the parameters to generating the recipe (optional)


    # [EXPERIMENTAL] GenerateConfigurationRecipe: Generates a recipe sufficient to perform valuations for the given portfolio.
    api_response = api_instance.generate_configuration_recipe(scope, code, create_recipe_request=create_recipe_request)
    print(api_response)
except lusid.rest.ApiException as e:
    print("Exception when calling AggregationApi->generate_configuration_recipe: %s\n" % e)

Exception when calling AggregationApi->generate_configuration_recipe: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 16 Nov 2022 16:49:01 GMT', 'Content-Type': 'application/problem+json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'lusid-meta-success': 'False', 'lusid-meta-requestId': '0HMM7SHJT111T:00000032', 'lusid-meta-correlationId': '0HMM7SHJT111T:00000032', 'lusid-meta-duration': '333', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains', 'Server': 'FINBOURNE', 'Content-Security-Policy': "default-src 'self' https://*.lusid.com https://*.finbourne.com; script-src 'unsafe-inline' 'self' https://*.lusid.com https://*.finbourne.com https://editor.swagger.io; font-src 'self' fonts.googleapis.com; img-src data: 'self' https://*.lusid.com https://*.finbourne.com https://validator.swagger.io; style-src 'unsafe-inline' 'self' https://*.lusid.com https://*.finbourne.com; report-uri https://lusid.report

Here, we use our api factory to build an aggregations API object.
We then call the generate configuration recipe method on this object, which communicates with one of our /api/aggregation/ endpoints to generate a config recipe.

### Tools for interacting with LUSID in our hosted Jupyter environment.
We provide a Jupyter environment that you can access and use to interactively write Python and dotnet scripts. In our Jupyter environment, the preview SDKs come installed by default.

We provide the lusidjam library, which can be used to provide an authentication token without re-entering credentials into your Python scripts:

In [74]:
import lusidjam

api_factory = lusid.utilities.ApiClientFactory(
    token = lusidjam.RefreshingToken()
)

Here we've built an api-factory using credentials stored in our jupyter environment.

We also provide a custom magic command to query luminesce:

In [None]:
%%luminesce
SELECT * FROM Lusid.Instrument.Equity LIMIT 10

This cell magic runs any statement in the cell below the magic command, displaying a pandas dataframe containing the output of the luminesce query.

In [None]:
results = %luminesce SELECT * FROM Lusid.Instrument.Equity LIMIT 10
results.head()

This line magic runs the query on the same line after the magic command, allowing us to use the result of the query in our Python code.