![ine-divider](https://user-images.githubusercontent.com/7065401/92672068-398e8080-f2ee-11ea-82d6-ad53f7feb5c0.png)
<hr>

# PostgreSQL for Python Developers

## User-defined functions on PostgreSQL

In this project, you will enhance PostgreSQL to perform operations not available by default in queries.

You will need access to a PostgreSQL installation where you have superuser permissions. If you do not have such access elsewhere, installing to your personal workstation is a good idea.  Alternately, you might wish to use a Docker container for a self-contained installation.  See `https://hub.docker.com/_/postgres` for details on that option.  Unless you have a specific need to work with an existing installation, choosing a PostgreSQL version of 12 or higher is best.

![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

## Part 1

**Calculating geometric mean**

In this task, we will continue to use the airline tweets table that has been used in previous projects.

In [1]:
import pandas as pd
import psycopg2
cred = dict(user='ine_student', password='ine-password', database='ine', host='localhost')
conn = psycopg2.connect(**cred)
cur = conn.cursor()

Sometimes we with to perform queries that describe aggregations of the data.  For example:

In [2]:
sql = """
SELECT airline, avg(airline_sentiment_confidence), avg(negativereason_confidence)
FROM tweets
GROUP BY airline;
"""
cur.execute(sql)
pd.DataFrame(cur.fetchall(), columns=['airline', 'avg_sentiment_conf', 'avg_neg_conf'])

Unnamed: 0,airline,avg_sentiment_conf,avg_neg_conf
0,Virgin America,0.83876,0.556985
1,Southwest,0.867695,0.601508
2,Delta,0.847048,0.573097
3,American,0.912919,0.665752
4,US Airways,0.919169,0.687172
5,United,0.896252,0.627285


For this task, you would like to make a similar report, but using geometric mean rather than arithmetic mean.  Since PostgreSQL does not include a function for geometric mean in a default installation, you will have to create one.  In the PosgreSQL documentation, there is discussion of aggregations beyond what is provided in the lesson itself. See:

> https://www.postgresql.org/docs/current/xaggr.html

As well, you may wish to think about the Python `statistics` module for an implementation of geometric mean:

> https://docs.python.org/3/library/statistics.html

In [1]:
# your code goes here


![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)

## Part 2

**Create a view into averages**

For this task, we suppose that users frequently wish to see the average confidence about sentiment about airlines.  The would like to be able to query both geometric and arithmetic means as if they are simple columns.  E.g.

```
ine=# SELECT * FROM airline_confidences

 airline        |  gm_conf | gm_neg_conf | avg_conf | avg_neg_conf
----------------+----------+-------------+----------+-------------
 Virgin America | 0.885061 | 0.675881    | 0.901733 | 0.717003
 Southwest      | 0.905992 | 0.688806    | 0.920533 | 0.732866
 Delta          | 0.883982 | 0.66428     | 0.902202 | 0.71052 
 American       | 0.934933 | 0.700478    | 0.945037 | 0.744327
 US Airways     | 0.935062 | 0.70556     | 0.945714 | 0.750028
 United         | 0.920436 | 0.668733    | 0.933383 | 0.714719
```

In [2]:
# your code goes here


![orange-divider](https://user-images.githubusercontent.com/7065401/92672455-187a5f80-f2ef-11ea-890c-40be9474f7b7.png)