<a href="https://colab.research.google.com/github/jpacilo/PythonWorkshop/blob/main/Lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Better Practices** in Python For Data Science
⚠️ Please make a copy of this colab notebook first by clicking **File -> Save a Copy in Drive** on the menu bar <br>

## README

**Lecturer**
- Joshua Paolo Acilo
- Model Development Expert
- EDO Advanced Analytics

**Schedule**
- 1:00 - 3:00 PM Lecture
- 3:00 - 4:00 PM Homework

**Reminders**
- Feel free to ask questions anytime! You can leave a message in the chatbox or unmute yourself and speak. <br>
- This is not an Introduction to Python. I expect everyone to at least know the basics in programming. <br>
- You learn more by doing. Try to adopt this new concepts in your workflow next time!



## Setup Python

In [1]:
# check the current python version you have
import sys
sys.version

'3.7.12 (default, Jan 15 2022, 18:48:18) \n[GCC 7.5.0]'

In [2]:
# just to mute the warnings for deprecated methods
import warnings
warnings.filterwarnings("ignore")

In [3]:
# pendulum is a library to manipulate dates
!pip3 install pendulum

Collecting pendulum
  Downloading pendulum-2.1.2-cp37-cp37m-manylinux1_x86_64.whl (155 kB)
[K     |████████████████████████████████| 155 kB 4.4 MB/s 
[?25hCollecting pytzdata>=2020.1
  Downloading pytzdata-2020.1-py2.py3-none-any.whl (489 kB)
[K     |████████████████████████████████| 489 kB 43.5 MB/s 
Installing collected packages: pytzdata, pendulum
Successfully installed pendulum-2.1.2 pytzdata-2020.1


In [4]:
# geopandas is a library to manipulate spatial data
!pip3 install geopandas

Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 4.2 MB/s 
Collecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 43.7 MB/s 
[?25hCollecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 354 kB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1


In [None]:
# leafmap is a library to visualize spatial data
!pip3 install leafmap

In [3]:
# gives nicer output for your tests
!pip3 -q install pytest pytest-sugar

  Building wheel for pytest-sugar (setup.py) ... [?25l[?25hdone


## Write Clean Code 

Any fool can write code that a computer can understand. **Good programmers write code that humans can understand.** 🤔

### Variables

**DON'T(s)**
- Thou shall not start with a number. <br>
```4ever = True```
- Thou shall not use special characters. <br>
```amountIn$ = 100```
- Thou shall not use reserved keywords. <br>
```id = 10012216```

**DO(s)**
- PEP8 suggests to use snake_case. <br>
```lower_case_with_underscores = True```

Use **meaningful and pronounceable variable names.** Let the variable speak for itself. 🤯

In [None]:
import pendulum

def start_pipeline(date):
    # do stuff
    pass

# this is bad, not only it is unpronounceable, it is also vague and non-descriptive
ymddt = pendulum.now().strftime("%Y-%m-%d")
start_pipeline(ymddt)

# this is good, it gives me clue that the current date controls the timing of the pipeline
current_date = pendulum.now().strftime("%Y-%m-%d")
start_pipeline(current_date)

Of course, there will be some exceptions, especially in **domain-specific jargons.** 🧐

In [None]:
import numpy as np

# you'll see this very often in the lake
pxn_dt = pendulum.parse(current_date).subtract(days=1)

# this is boilerplate ML, so it's okay too
X, y = np.arange(10).reshape((5, 2)), range(5)

It is a fact that *we will read more code than we will ever write.* It's important that **the code is readable and searchable.** Yes, we can proceed with the quick and dirty way and get the same result as compared to the slow and cleaner way, but in the long run this will hurt your readers. 😓

In [None]:
def aggregate_features(window_duration):
    # do stuff
    pass

# i'm betting you'll forget this the next time you look at your code
aggregate_features(1440)

# we can assign a descriptive constant instead denoted by capital letters 
MINUTES_IN_A_DAY = 60 * 24
aggregate_features(MINUTES_IN_A_DAY)

Don't force the reader of your code to translate what the variable means. **Explicit is better than implicit.** 🤔

In [None]:
# this is bad, implicit
seq = ("Taguig", "Makati", "Mandaluyong")
for item in seq:
    # do stuff
    pass

# this is good, explicit
cities = ("Taguig", "Makati", "Mandaluyong")
for city in cities:
    # do stuff
    pass

### Functions

**Write a manual for your function using docstrings.** This will help not only you in the future, but also your future collaborators. 😉

In [6]:
from math import radians, cos, sin, asin, sqrt

# this is good, write docstrings as much as possible to future proof your work
def get_haversine_distance(lon1, lat1, lon2, lat2, r=6371):
    """Calculate the great circle distance (in kilometers) between two points on the earth.

    Args:
        lon1 (float): Longitude of Point 1
        lat1 (float): Latitude of Point 1
        lon2 (float): Longitude of Point 2
        lat2 (float): Latitude of Point 2
        r (int, optional): Radius of earth in kilometers. Defaults to 6371.

    Returns:
        float: Haversine distance between the two given coordinates.
    """

    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    
    return c * r

Your Python **functions should accomplish one thing.** When functions do more than one thing, they are harder to compose, test, and reason about. When you can isolate a function to just one action, they can be refactored easily and your code will read much cleaner.

In [7]:
from google.colab import drive
drive.mount("/content/drive/", force_remount=True)

Mounted at /content/drive/


In [8]:
%cd /content/drive/My Drive/DSP Python Workshop 2022/

/content/drive/My Drive/DSP Python Workshop 2022


In [9]:
import pandas as pd

def load_data(filename, schema):
    df = pd.read_csv(filename)
    df = df.astype(schema, errors="ignore")
    return df

filename = "data/cafes_in_bonifacio_global_city.csv"
schema = {
    "cafe_name": str,
    "address": str,
    "latitude": float,
    "longitude": float
} 
df = load_data(filename, schema)
display(df)

Unnamed: 0,cafe_name,address,latitude,longitude
0,Single Origin - Bonifacio High Street,"C3, Bonifacio Highstreet, 7th Ave",14.551564,121.049615
1,Frank & Dean Coffee,"Five/NEO, 31st Street, Taguig, Metro Manila",14.555208,121.043002
2,Cafe de Lipa,"H324+56C, Taguig, Metro Manila",14.550452,121.055519
3,UCC Clockwork,"Burgos Cir, Taguig Metro Manila",14.55251,121.043906
4,Coffee Project,"21st Dr, Taguig, 1630 Metro Manila",14.543019,121.047231
5,Starbucks (Bonifacio High Street),H322+F23,14.551219,121.050066
6,Wildflour Cafe + Bakery BGC,"Ground Floor Six/NEO 4th Avenue, Corner 26th S...",14.549488,121.04616
7,Luna Cafe,"Ground Floor, NAC Tower, 32nd St",14.553146,121.051328
8,Malongo Atelier Barista Philippines,"3rd Avenue, Lower Ground, One Bonifacio High S...",14.551821,121.046074
9,Highlands Coffee,"Arthaland Tower, 5th Ave",14.552934,121.047843


Suppose you and your new DSP friends want to go coffee shop hopping in Bonifacio Global City today. Since you only have an hour for lunch break, you decided to only visit n shops for now. The task is to find the n-closest coffee shops to each other from the given data. 🧩 

In [10]:
import leafmap
import itertools
import geopandas as gpd
from shapely.geometry import Polygon

In [28]:
THE_GLOBE_TOWER_COORDS = (14.553474948859346, 121.04989287111896)

In [29]:
# THIS IS BAD

def get_map(df, reference_point, n=3):

    # initialize map, set TGT as reference point for BGC
    map_select = leafmap.Map(
        center=reference_point, 
        zoom=16, 
        layers_control=True, 
        measure_control=False, 
        attribution_control=False
    )
    map_select.add_basemap("Stamen.TonerLite")

    # get points of interest df
    gdf_points = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude, crs="EPSG:4326"))
    gdf_points = gdf_points.drop(columns=["address", "latitude", "longitude"])

    cols_gdf = [(f"cafe_name_{i}", f"geometry_{i}") for i in range(1, n+1)]
    cols_gdf = [item for sublist in cols_gdf for item in sublist]
    cols_geometry = [col for col in cols_gdf if "geometry" in col]

    # get all possible combinations of poi(s) e.g. cafe(s)
    points_combinations = list(itertools.combinations(gdf_points.values.tolist(), n))
    
    # get polygons df
    gdf_polygons = pd.DataFrame(columns=cols_gdf)
    for i, points_combination in enumerate(points_combinations):
        gdf_polygons.loc[i] = [item for sublist in points_combination for item in sublist]

    # add n-polygon geometry column based from the given points 
    gdf_polygons["geometry"] = gdf_polygons.apply(lambda x: Polygon([x[col] for col in cols_geometry]), axis=1)
    gdf_polygons = gdf_polygons.drop(columns=cols_geometry)

    # 4326 for viz, 3857 for distance related calculations
    gdf_polygons = gpd.GeoDataFrame(gdf_polygons, crs="EPSG:4326")
    gdf_polygons["polygon_perimeter_in_meters"] = gdf_polygons.to_crs(3857)["geometry"].length

    # add the points and polygons gdf
    map_select.add_gdf(gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=True).head(1), layer_name="Smallest Geom", fill_colors=["green"])
    map_select.add_gdf(gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=False).head(1), layer_name="Biggest Geom", fill_colors=["red"])
    map_select.add_gdf(gdf_points, layer_name="Cafes in BGC")

    return map_select


In [44]:
# THIS IS BETTER

def flatten_list(lst):
    flattened_list = [item for sublist in lst for item in sublist]
    return flattened_list

def get_column_names(n, geom):
    cols = flatten_list([(f"cafe_name_{i}", f"geometry_{i}") for i in range(1, n+1)])
    if geom:
        return [col for col in cols if "geometry" in col]
    else:
        return cols

def get_point_combinations(gdf_points, n):
    points_combinations = list(itertools.combinations(gdf_points.values.tolist(), n))
    return points_combinations

def get_gdf_points(df):
    gdf_points = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude, crs="EPSG:4326"))
    gdf_points = gdf_points.drop(columns=["address", "latitude", "longitude"])
    return gdf_points

def get_gdf_polygons(gdf_points, n):
    
    # get column names
    cols_gdf = get_column_names(n, False)
    cols_geometry = get_column_names(n, True)

    # get all possible combinations of poi(s) e.g. cafe(s)
    points_combinations = get_point_combinations(gdf_points, n)

    # create polygons table
    gdf_polygons = pd.DataFrame(columns=cols_gdf)
    for i, points_combination in enumerate(points_combinations):
        gdf_polygons.loc[i] = flatten_list(points_combination)

    # add n-polygon geometry column based from the given points 
    gdf_polygons["geometry"] = gdf_polygons.apply(lambda x: Polygon([x[col] for col in cols_geometry]), axis=1)
    gdf_polygons = gdf_polygons.drop(columns=cols_geometry)
    
    # 4326 for viz, 3857 for distance related calculations
    gdf_polygons = gpd.GeoDataFrame(gdf_polygons, crs="EPSG:4326")
    gdf_polygons["polygon_perimeter_in_meters"] = gdf_polygons.to_crs(3857)["geometry"].length

    return gdf_polygons

def get_map(reference_point, gdf_points, gdf_polygons):

    # initialize map, set TGT as reference point for BGC
    map_select = leafmap.Map(
        center=reference_point, 
        zoom=16, 
        layers_control=True, 
        measure_control=False, 
        attribution_control=False
    )
    map_select.add_basemap("Stamen.TonerLite")

    # add the points and polygons gdf
    map_select.add_gdf(gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=True).head(1), layer_name="Smallest Geom", fill_colors=["green"])
    map_select.add_gdf(gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=False).head(1), layer_name="Biggest Geom", fill_colors=["red"])
    map_select.add_gdf(gdf_points, layer_name="Cafes in BGC")

    return map_select

In [45]:
gdf_points = get_gdf_points(df)
gdf_points.head(1)

Unnamed: 0,cafe_name,geometry
0,Single Origin - Bonifacio High Street,POINT (121.04961 14.55156)


In [46]:
gdf_polygons = get_gdf_polygons(gdf_points, 4)
gdf_polygons.head(1)

Unnamed: 0,cafe_name_1,cafe_name_2,cafe_name_3,cafe_name_4,geometry,polygon_perimeter_in_meters
0,Single Origin - Bonifacio High Street,Frank & Dean Coffee,Cafe de Lipa,UCC Clockwork,"POLYGON ((121.04961 14.55156, 121.04300 14.555...",4302.791581


In [47]:
gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=True).head(1)

Unnamed: 0,cafe_name_1,cafe_name_2,cafe_name_3,cafe_name_4,geometry,polygon_perimeter_in_meters
561,Single Origin - Bonifacio High Street,Starbucks (Bonifacio High Street),The Coffee Bean & Tea Leaf,St. Louis Cafe,"POLYGON ((121.04961 14.55156, 121.05007 14.551...",416.151742


In [48]:
gdf_polygons.sort_values(by="polygon_perimeter_in_meters", ascending=False).head(1)

Unnamed: 0,cafe_name_1,cafe_name_2,cafe_name_3,cafe_name_4,geometry,polygon_perimeter_in_meters
969,Frank & Dean Coffee,Cafe de Lipa,UCC Clockwork,Coffee Project,"POLYGON ((121.04300 14.55521, 121.05552 14.550...",5442.470451


In [50]:
# get_map(THE_GLOBE_TOWER_COORDS, gdf_points, gdf_polygons)

## Write Tested Code
Just because you've counted all the trees **doesn't mean you've seen the forest.** 🤔

Basically, you should write tests for your data science projects because it:
- allows collaborators to **understand your code better**
- confirms that the code is **working as expected**
- helps in detecting **edge cases** or scenarios


In [1]:
from google.colab import drive
drive.mount("/content/drive/", force_remount=True)

Mounted at /content/drive/


In [24]:
%cd /content/drive/My Drive/DSP Python Workshop 2022/

/content/drive/My Drive/DSP Python Workshop 2022


Suppose we have this function that identifies the sentiment of an English text. 🧐

In [46]:
from textblob import TextBlob

def extract_sentiment(text: str):
    """Extract text sentiments using textblob library
    Args:
        text (str): English text
    Returns:
        float: Polarity of the sentiment ranging from -1 to 1
    """

    text = TextBlob(text)
    sentiment = text.sentiment.polarity
    
    return sentiment

Since we will be using this library for the first time, we don't know how it reacts to different scenarios. We want to make sure that this tool or model is reliable, so **we will be testing it against multiple text inputs**, from the obvious scenarios to the rare ones or the edge cases.

In [47]:
extract_sentiment("The weather is beautiful today!")

1.0

In [49]:
extract_sentiment("I had a bad meeting yesterday.")

-0.6999999999999998

We want to be able to do this kind of testing next time, but it is better to do it in a modular kind of way. So we will be using *pytest* - it is a **framework that makes it easy to write small, readable tests**, and can scale to support complex functional testing for applications and libraries.

In [54]:
ls

[0m[01;34mdata[0m/  [01;34msrc[0m/  [01;34mtests[0m/


This is the structure of our simple test project: <br>

> DSP Python Workshop 2022

>> src
>>> sentiment.py

>> tests
>>> test_sentiment.py

In [56]:
%%file src/sentiment.py

from textblob import TextBlob

def extract_sentiment(text: str):
    """Extract text sentiments using textblob library
    Args:
        text (str): English text
    Returns:
        float: Polarity of the sentiment ranging from -1 to 1
    """

    text = TextBlob(text)
    sentiment = text.sentiment.polarity
    
    return sentiment

Overwriting src/sentiment.py


In [65]:
%%file tests/test_sentiment.py

import sys
import os.path
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
)
from src.sentiment import extract_sentiment

def test_extract_sentiment_positive():

    text = "I did well on the exam last week."
    sentiment = extract_sentiment(text)

    assert sentiment > 0

def test_extract_sentiment_negative():

    text = "This workshop is pretty basic and boring!"
    sentiment = extract_sentiment(text)

    assert sentiment < 0

def test_extract_sentiment_neutral():

    text = "..."
    sentiment = extract_sentiment(text)

    assert sentiment == 0

def test_extract_sentiment_filipino():

    text = "Nakakaengganyo pakinggan ang guro namin sa workshop"
    sentiment = extract_sentiment(text)

    assert sentiment > 0

Overwriting tests/test_sentiment.py


We will be calling the *pytest* from the terminal. This will loop through our script and run the functions that have a prefix of **test**. 🤯

In [66]:
!python3 -m pytest -vv tests/test_sentiment.py

[1mTest session starts (platform: linux, Python 3.7.12, pytest 3.6.4, pytest-sugar 0.9.4)[0m
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/DSP Python Workshop 2022, inifile:
plugins: typeguard-2.7.1, sugar-0.9.4


――――――――――――――――――――――― test_extract_sentiment_positive ――――――――――――――――――――――――

[1m    def test_extract_sentiment_positive():[0m
[1m    [0m
[1m        text = "I did well on the exam last week."[0m
[1m        sentiment = extract_sentiment(text)[0m
[1m    [0m
[1m>       assert sentiment > 0[0m
[1m[31mE       assert 0.0 > 0[0m

[1m[31mtests/test_sentiment.py[0m:14: AssertionError

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[0m [31m⨯[0m       [31m25% [0m[40m[31m█[0m[40m[31m█▌       [0m
 [36mtests/test_sentiment.py[0m::test_extract_sentiment_negative[0m [32m✓[0m       [31m50% [0m[40m[31m█[0m[40m[31m█[0m[40m[32m█[0m[40m[32m██     [0m
 [36mtests/test_sentiment.py[0m::test_extract_sentiment_neutral

From the pytest output shown, we can see the scenarios where the function fails (e.g. the positive and filipino test inputs) and succeeds. From this exercise, **we are not only able to know whether our function works as expected but also know why it doesn’t work.** Based on result of the positive test input, we know that this sentiment identifier model from textblob isn't correct all the time. As the developer, we can now make an informed decision on what to do next. This shows the value of testing your work before using it in production. 🤩

We can also test multiple inputs using ```pytest.mark.parametrize```

In [78]:
%%file tests/test_sentiment.py

import sys
import os.path
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
)
import pytest
from src.sentiment import extract_sentiment

test_inputs_positive = [
    "I am blessed with a wonderful family.",
    "I am thankful for my company.",
    "I am grateful for my friends."
]

test_inputs_negative = [
    "I feel bad for leaving the party early last night.",
    "I am still disappointed from my performance last week.",
    "I am too sick to travel tomorrow."
]

@pytest.mark.parametrize("text", test_inputs_positive)
def test_extract_sentiment_positive(text):

    sentiment = extract_sentiment(text)

    assert sentiment > 0

@pytest.mark.parametrize("text", test_inputs_negative)
def test_extract_sentiment_negative(text):

    sentiment = extract_sentiment(text)

    assert sentiment < 0

Overwriting tests/test_sentiment.py


In [81]:
!python3 -m pytest -vv tests/test_sentiment.py

[1mTest session starts (platform: linux, Python 3.7.12, pytest 3.6.4, pytest-sugar 0.9.4)[0m
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/DSP Python Workshop 2022, inifile:
plugins: typeguard-2.7.1, sugar-0.9.4

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[I am blessed with a wonderful family.][0m [32m✓[0m[32m17% [0m[40m[32m█[0m[40m[32m▋        [0m

―――――――― test_extract_sentiment_positive[I am thankful for my company.] ――――――――

text = 'I am thankful for my company.'

[1m    @pytest.mark.parametrize("text", test_inputs_positive)[0m
[1m    def test_extract_sentiment_positive(text):[0m
[1m    [0m
[1m        sentiment = extract_sentiment(text)[0m
[1m    [0m
[1m>       assert sentiment > 0[0m
[1m[31mE       assert 0.0 > 0[0m

[1m[31mtests/test_sentiment.py[0m:27: AssertionError

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[I am thankful for my company.][0m [31m⨯[0m[31m33% [0m[40m[32m█[0m[40m[31

There comes a time where the test cases in your script will be lengthy and comprehensive. We can choose to run a specific test function one at a time using this syntax ```pytest file.py::function_name```

In [82]:
!python3 -m pytest -vv tests/test_sentiment.py::test_extract_sentiment_positive

[1mTest session starts (platform: linux, Python 3.7.12, pytest 3.6.4, pytest-sugar 0.9.4)[0m
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/DSP Python Workshop 2022, inifile:
plugins: typeguard-2.7.1, sugar-0.9.4

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[I am blessed with a wonderful family.][0m [32m✓[0m[32m33% [0m[40m[32m█[0m[40m[32m██▍      [0m

―――――――― test_extract_sentiment_positive[I am thankful for my company.] ――――――――

text = 'I am thankful for my company.'

[1m    @pytest.mark.parametrize("text", test_inputs_positive)[0m
[1m    def test_extract_sentiment_positive(text):[0m
[1m    [0m
[1m        sentiment = extract_sentiment(text)[0m
[1m    [0m
[1m>       assert sentiment > 0[0m
[1m[31mE       assert 0.0 > 0[0m

[1m[31mtests/test_sentiment.py[0m:27: AssertionError

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[I am thankful for my company.][0m [31m⨯[0m[31m67% [0m[40m[32m█[0m[40m[32

We can also choose to use the same test input data to different functions using ```pytest.fixture```

In [88]:
%%file tests/test_sentiment.py

import sys
import os.path
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir))
)
import pytest
from src.sentiment import extract_sentiment

@pytest.fixture
def sample_data():
    return "I had mixed feelings about the concert last night."

def test_extract_sentiment_positive(sample_data):

    sentiment = extract_sentiment(sample_data)

    assert sentiment > 0

def test_extract_sentiment_negative(sample_data):

    sentiment = extract_sentiment(sample_data)

    assert sentiment < 0

def test_extract_sentiment_neutral(sample_data):

    sentiment = extract_sentiment(sample_data)

    assert sentiment == 0

Overwriting tests/test_sentiment.py


In [89]:
!python3 -m pytest -vv tests/test_sentiment.py

[1mTest session starts (platform: linux, Python 3.7.12, pytest 3.6.4, pytest-sugar 0.9.4)[0m
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/DSP Python Workshop 2022, inifile:
plugins: typeguard-2.7.1, sugar-0.9.4


――――――――――――――――――――――― test_extract_sentiment_positive ――――――――――――――――――――――――

sample_data = 'I had mixed feelings about the concert last night.'

[1m    def test_extract_sentiment_positive(sample_data):[0m
[1m    [0m
[1m        sentiment = extract_sentiment(sample_data)[0m
[1m    [0m
[1m>       assert sentiment > 0[0m
[1m[31mE       assert 0.0 > 0[0m

[1m[31mtests/test_sentiment.py[0m:18: AssertionError

 [36mtests/test_sentiment.py[0m::test_extract_sentiment_positive[0m [31m⨯[0m       [31m33% [0m[40m[31m█[0m[40m[31m██▍      [0m

――――――――――――――――――――――― test_extract_sentiment_negative ――――――――――――――――――――――――

sample_data = 'I had mixed feelings about the concert last night.'

[1m    def test_extract_sentiment_negative(sample_data

## Write Performant Code
Efficiency is doing better what is already being done. 🤔

### Python

### Pandas