# Python: External Sources

In [1]:
%%html
<style>
.dataframe th {
    font-size: 11px;
}
.dataframe td {
    font-size: 11px;
}
</style>

## 1. Operating system

### 1.1. Directory
The absolute path begins from the root folder. The relative path begins from the working folder.

In [3]:
# absolute path
'D:/Documents/Jupyter/tabular-book/data/iris.csv'

'D:/Documents/Jupyter/tabular-book/data/iris.csv'

In [1]:
# relative path
'../data/iris.csv'

'../data/iris.csv'

### 1.2. Opearting system

In [4]:
import os
import glob
from sspipe import p, px

The `os.listdir()` function returns all files and folders in the given folder. If no folders was passed, it takes the current folder as default.

In [5]:
os.listdir('../image')[:5]

['0-raw-image.pptx',
 'acquisition_functions.png',
 'area_problem.png',
 'computational_graph.png',
 'confusion_matrix_binary.png']

In [16]:
directory = '../data'
os.path.join(directory, '*.csv') | p(glob.glob) | p(sorted)[:5]

['data/boston.csv',
 'data/breast_cancer.csv',
 'data/cars.csv',
 'data/cars_40k.csv',
 'data/cpi.csv']

In [17]:
os.listdir('..')

['.DS_Store',
 'Python',
 'Projects',
 'R',
 'Others',
 'data-science-blog',
 'courses',
 '.virtual_documents',
 'Courses_old',
 'AI Academy']

In [9]:
# automatically creat path using suitable slashes
os.path.join('../data', 'iris.csv')

'../data\\iris.csv'

In [7]:
# convert a relative path to the absolute path
os.path.abspath('../data')

'D:\\Documents\\Jupyter\\tabular-book\\data'

In [8]:
# check whether a path exists
os.path.exists(r'../data/anaconda')

False

#### Renaming files

In [2]:
def order_add_1(start):
    import os, re
    for nameOld in os.listdir():
        try:
            indexOld = int(re.findall('\d+', nameOld)[0])
        except:
            indexOld = 0
        if indexOld >= start:
            indexNew = (indexOld + 1)
            indexNew = f'{indexNew:0>2}'

            nameNew = re.sub('\d+', indexNew, nameOld)
            os.rename(nameOld, nameNew)

order_add_1(40)

### 1.3. Logging
Logging is an important component when writing a program, it stores the detailed information and flow of how a program runs. Logging is very much like printing at first glance, but logging is capable of storing messages in a more convinent and structured way. By logging useful data from the right places, you can not only debug errors easily but also use the data to analyze the performance of the application. Native Python has a built-in package [Logging] for us to do the task.

[Logging]: https://docs.python.org/3/library/logging.html

#### Logging levels
The Logging module has five standard [logging levels] indicating how servere an event is (sorted by ascending severity). The can be access through a class or an integer, but I highly recommend using the classes for more readability.
- `DEBUG` With this level, you are giving diagnostic and troubleshooting information. Not necessary when running the application, but useful for debugging.
- `INFO` Messages of this level state what happened when the program is running. For example: LightGBM has beaten other candiates and becomes the best algorithm.
- `WARNING` This level should be used when the job returns output with appropriate format, but may not be logically correct. Data drift or data with full of zeros is a practical use case for warning messages.
- `ERROR` This is very clear, use it when the job/script fails to run. For example, the input data for a job does not exist.
- `CRITICAL` This level is only used when there is a serious problem.

Each level has a corresponding function for recording that type of event. By default, the Logging module only shows messages with higher level than WARNING. We can easily adjust this behaviour.

[logging levels]: https://docs.python.org/3/library/logging.html#levels

In [1]:
import logging
logging.basicConfig(level=logging.INFO)

In [2]:
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')

INFO:root:This is an info message
ERROR:root:This is an error message
CRITICAL:root:This is a critical message


In [63]:
a = 5
b = 0

try:
    c = a / b
except:
    logging.error("Exception occurred", exc_info=True)

ERROR:root:Exception occurred
Traceback (most recent call last):
  File "C:\Users\hungpq5\AppData\Local\Temp/ipykernel_40800/2671484370.py", line 5, in <module>
    c = a / b
ZeroDivisionError: division by zero


#### Customization
You can customize logging behaviour via the function [`basicConfig()`]
- Write logs into a file by specify a `filename` rather than printing to the console. Come with `filemode='w'` (overwriting) or `filemode='a'` (appending).
- Customize log `format` using different [logging attributes]. We recommend pairing this parameter with `style='{'` for using native Python formatting syntax.
- Of course, we can customize logging levels as shown in the previous section.

[`basicConfig()`]: https://docs.python.org/3/library/logging.html#logging.basicConfig
[logging attributes]: https://docs.python.org/3/library/logging.html#logrecord-attributes

In [8]:
import logging
logging.basicConfig(
    level=logging.INFO,
    filename='../output/mylog.log',
    filemode='w',
    style='{',
    format='{asctime} - {levelname}:{name} - {message}',
    datefmt='%H:%M:%S'
)

In [2]:
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')

#### Loggers
The Logging module allows creating different loggers for different logging purposes. This is done by passing any name to the `getLogger()` function, Logging considers this string the identifier of the logger. In other words, loggers are unique by name, and calling this function multiple times returns the same logger.

In fact, configuring the Logging module directly refers to a logger named `root`. If the project is not too complicated, this is the recommended way for logging.

In [1]:
import logging

In [13]:
formatter = logging.Formatter(style='{', fmt='{asctime} - {levelname}:{name} - {message}')

handler = logging.StreamHandler()
handler.setLevel(logging.DEBUG)
handler.setFormatter(formatter)

logger = logging.getLogger('my_logger')
logger.addHandler(handler)

In [14]:
logger.info('This is an info message')

2022-11-15 13:36:28,854 - INFO:my_logger - This is an info message


### 1.4. Argparse
In practical deployment, running a Python file requires different parameters to be passed in, For example, you would want to specify the date without modifying source code. In this situation, we use the [Argparse] library.

[Argparse]: https://docs.python.org/3/library/argparse.html

#### Arguments
Argparse (or more precise, command line) has two types of arguments, positional and optional, distinguished via the prefix `-`

In [11]:
%%writefile ../output/demo_argparse.py
import os
import sys
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('alpha', type=int, default=1, nargs='?', help='first term')
parser.add_argument('-b', '--beta', type=int, default=2, nargs='?')
parser.add_argument('-g', '--gamma', type=int, default=3, nargs='?')
parser.add_argument('delta', type=int, default=4, nargs='?')
parser.add_argument('--sigma', action='store_true')
namespace = parser.parse_args()

print(
    f'alpha={namespace.alpha}, '
    f'beta={namespace.beta}, '
    f'gamma={namespace.gamma}, '
    f'delta={namespace.delta}, '
    f'sigma={namespace.sigma}'
)

Overwriting ../output/demo_argparse.py


#### Command line

In [12]:
!python ../output/demo_argparse.py

alpha=1, beta=2, gamma=3, delta=4, sigma=False


In [5]:
!python ../output/demo_argparse.py -h

usage: demo_argparse.py [-h] [-b [BETA]] [-g [GAMMA]] [--sigma]
                        [alpha] [delta]

positional arguments:
  alpha                 first term
  delta

optional arguments:
  -h, --help            show this help message and exit
  -b [BETA], --beta [BETA]
  -g [GAMMA], --gamma [GAMMA]
  --sigma


In [6]:
!python ../output/demo_argparse.py 10

alpha=10, beta=2, gamma=3, delta=4, sigma=False


In [7]:
!python ../output/demo_argparse.py 10 1000 -b 20 -g 30

alpha=10, beta=20, gamma=30, delta=1000, sigma=False


In [8]:
!python ../output/demo_argparse.py 10 1000 --beta 20 --gamma 30 --sigma

alpha=10, beta=20, gamma=30, delta=1000, sigma=True


## 2. Databases

### 2.1. SQL Server
The `pyodbc` library is used to connect to SQL Server.

In [None]:
import pandas as pd
import pyodbc

#### Connecting
For local database, use a trusted connection. Otherwise, provide username and password.

In [None]:
# local database information
driver = 'SQL Server Native Client 11.0'
server = ''
database = ''

# connect
connect = pyodbc.connect(
    f'Driver={{{driver}}};'
    f'Server={server};'
    f'Database={database};'
    f'Trusted_Connection=yes;'
)
cursor = connect.cursor()

In [None]:
# database with username and password
driver = 'SQL Server'
server = '116.118.119.204,8433'
database = 'PMS_CON'
username = 'aivn_dbuser'
password = 'ai@lgp2020'

# connect
connect = pyodbc.connect(
    f'Driver={{{driver}}};'
    f'Server={server};'
    f'Database={database};'
    f'UID={username};'
    f'PWD={password};'
)
cursor = connect.cursor()

#### Listing all tables

In [None]:
cursor.execute('''
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'dbo'
''')

[table[0] for table in cursor.fetchall()]

#### Running a query

In [None]:
query = '''
SELECT TOP 5 *
FROM PROJECT_BUDGET_ADJ
'''

pd.read_sql_query(query, connect)

#### Reading an entire table

In [None]:
table = ''

pd.read_sql_query(f'SELECT * FROM {table}', connect)

### 2.2. Postgre SQL
The `psycopg2` library is used to connect to Postgre SQL.

#### Connecting

In [None]:
import pandas as pd
import psycopg2

In [None]:
# database information
host = '101.96.116.82'
port = '8182'
database = 'tintuc_haisan'
username = 'mp_quantri'
password = 'asdaw@23423das'

# connect
connect = psycopg2.connect(host=host, port=port, dbname=database, user=username, password=password)
cursor = connect.cursor()

#### Listing all tables

In [None]:
cursor.execute('''
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_name != 'user'
''')

[table[0] for table in cursor.fetchall()]

#### Running a query

In [None]:
query = '''
SELECT *
FROM bow
LIMIT 5
'''

pd.read_sql_query(query, connect)

#### Reading an entire table

In [None]:
table = ''

pd.read_sql_query(f'SELECT * FROM {table}', connect)

## 3. APIs
This section demonstrates the usage of [Clash Royal API](https://developer.clashroyale.com/#/documentation).
- Go to the Clash Royale API site and register an account.
- Find the device's *public IP address* by searching for "my ip" or "what is my ip" on Google. It is sometimes called *external IP*, and should not be confused with *local/internal IP*. Another way is to run `curl ifconfig.me` on either a macOS terminal or a Windows terminal.
- Open account settings and create a new key, using the *public IP* found earlier. This is the method Clash Royale API uses for authorizing the requests. After finished, copy the *token* associated with the key.
- Use the `requests` library to connect to the API and get response data.

In [1]:
!curl ifconfig.me

113.190.6.123

In [2]:
import numpy as np
import pandas as pd
import requests
import json
pd.options.display.max_rows = 200

In [4]:
url = 'https://api.clashroyale.com/v1/players/%232082RVQQ'
token = 'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzUxMiIsImtpZCI6IjI4YTMxOGY3LTAwMDAtYTFlYi03ZmExLTJjNzQzM2M2Y2NhNSJ9.eyJpc3MiOiJzdXBlcmNlbGwiLCJhdWQiOiJzdXBlcmNlbGw6Z2FtZWFwaSIsImp0aSI6IjI2MjRiOWZjLTJlYTctNDMwZi04NGQwLTMwODliNDVkZDc1ZSIsImlhdCI6MTY1MDcyMjk5OSwic3ViIjoiZGV2ZWxvcGVyLzI5OWUwODM3LTlmZTctNjJhNC01ODU1LTFlOWE4MjJmM2IxNyIsInNjb3BlcyI6WyJyb3lhbGUiXSwibGltaXRzIjpbeyJ0aWVyIjoiZGV2ZWxvcGVyL3NpbHZlciIsInR5cGUiOiJ0aHJvdHRsaW5nIn0seyJjaWRycyI6WyIxMTMuMTkwLjYuMTIzIl0sInR5cGUiOiJjbGllbnQifV19.3TtTpu0F3CT1__2WYwZuAiKmxXxfs1xWZShZpDS05WjXTJ-kphuCwjOc2g7C6qOcjw4KedfJ8kCJozC63k5EQQ'

headers = {
    'Accept': 'application/json',
    'Authorization': f'Bearer {token}'}

response = requests.get(url=url, headers=headers)
response = response.json()
data = response['cards']
data = {i: data[i] for i in range(len(data))}

In [5]:
dfUpgrade = pd.DataFrame({
    'level': range(1, 15),
    'Common': [0, 2, 4, 10, 20, 50, 100, 200, 400, 800, 1000, 1500, 3000, 5000],
    'Rare': [-1]*2 + [0, 2, 4, 10, 20, 50, 100, 200, 400, 500, 750, 1250],
    'Epic': [-1]*5 + [0, 2, 4, 10, 20, 40, 50, 100, 200],
    'Legendary': [-1]*8 + [0, 2, 4, 6, 10, 20],
    'Champion': [-1]*10 + [0, 2, 8, 20]
})
dfUpgrade = dfUpgrade.melt(id_vars='level', var_name='rarity', value_name='cost')
dfUpgrade = dfUpgrade.query("cost != -1")
dfUpgrade['consumed'] = dfUpgrade.groupby('rarity').cost.cumsum()

dfMax = dfUpgrade.query('level == 14')
dfMax = dfMax[['rarity', 'consumed']]
dfMax = dfMax.rename(columns={'consumed': 'target'})

In [6]:
mapLevelRarity = {14: 'Common', 12: 'Rare', 9: 'Epic', 6: 'Legendary', 4: 'Champion'}

df = pd.DataFrame.from_dict(data, orient='index')
df = df.assign(rarity=df.maxLevel.map(mapLevelRarity))
df = df.assign(starLevel=df.starLevel.fillna(0).astype(int))
df = df.assign(level=df.level+14-df.maxLevel)
df = df[['rarity', 'name', 'level', 'count']]

df = df.merge(dfUpgrade.drop(columns=['cost']), on=['rarity', 'level'])
df = df.merge(dfMax, on='rarity')
df = df.eval('collected = count + consumed')
df = df.eval('required = target - collected')

df = df.assign(rarity=pd.Categorical(df.rarity, categories=['Common', 'Rare', 'Epic', 'Legendary', 'Champion']))
df = df.sort_values(by=['rarity', 'level', 'count']).reset_index(drop=True)

df.head()

Unnamed: 0,rarity,name,level,count,consumed,target,collected,required
0,Common,Tesla,13,3706,7086,12086,10792,1294
1,Common,Firecracker,13,3800,7086,12086,10886,1200
2,Common,Royal Delivery,13,4103,7086,12086,11189,897
3,Common,Giant Snowball,13,4362,7086,12086,11448,638
4,Common,Skeleton Barrel,13,4402,7086,12086,11488,598


In [7]:
dfRequired = df.groupby('rarity').required.sum().reset_index()

dfProgress = df.groupby('rarity')[['collected', 'target']].sum().reset_index()
dfProgress = dfProgress.eval('progress = collected / target')
dfProgress = dfProgress[['rarity', 'progress']]

dfMeanCount = df.groupby('rarity').collected.mean().astype(int).reset_index()
dfMeanCount = dfMeanCount.merge(dfUpgrade.query('level==13'), on='rarity')
dfMeanCount = dfMeanCount.eval('meanCount = collected - consumed')
dfMeanCount = dfMeanCount[['rarity', 'meanCount']]

dfSummary = dfRequired.merge(dfProgress, on='rarity', how='left')
dfSummary = dfSummary.merge(dfMeanCount, on='rarity', how='left')
dfSummary['progress'] = dfSummary.progress.map(lambda x: f'{x:.2%}')
dfSummary

Unnamed: 0,rarity,required,progress,meanCount
0,Common,6584,98.05%,4764
1,Rare,2015,97.81%,1178
2,Epic,0,100.00%,200
3,Legendary,0,100.00%,20
4,Champion,27,77.50%,13
