## Connecting to a PostgreSQL  database

In [70]:
# Import create_engine function and text from SQLAlchemy to create a database engine
from sqlalchemy import create_engine, text

# Import MetaData and Table to define and reflect database schema
from sqlalchemy import MetaData, Table

# Define the database connection URL with PostgreSQL dialect and credentials
database_url = (
    'postgresql+psycopg2://'  # Dialect and driver: 'postgresql' specifies the database type, 'psycopg2' is the Python adapter for PostgreSQL
    'student:'               # Username: 'student' is the user accessing the database
    'datacamp'               # Password: 'datacamp' is the password for the 'student' user
    '@'                      # Separator: separates credentials from the host information
    'postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com'  # Host: the Amazon RDS endpoint where the database is hosted
    ':5432'                  # Port: '5432' is the default PostgreSQL port number
    '/census'                # Database name: 'census' is the specific database within the PostgreSQL server to connect to
)

# Create an engine to the census database
engine = create_engine(database_url)

# Create a connection on engine
connection = engine.connect()

# Use the inspector to get table names
inspector = inspect(engine)
print(inspector.get_table_names())

# Create a MetaData object to hold database schema information
metadata = MetaData()

# Reflect the 'census' table from the PostgreSQL database using the engine
# 'autoload_with=engine' tells SQLAlchemy to load the table structure from the database
census = Table('census', metadata, autoload_with=engine)

# Print the table metadata (shows table name, columns, etc.)
print(repr(census))

['datatrial', 'datatrial2', 'data', 'census', 'new_data', 'census1', 'data1', 'employees', 'employees3', 'employees_2', 'nyc_jobs', 'final_orders', 'state_fact', 'orders', 'users', 'vrska']
Table('census', MetaData(), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)


## Viewing Table details
Great job reflecting the census table! Now you can begin to learn more about the columns and structure of your table. It is important to get an understanding of your database by examining the column names. This can be done by using the .columns attribute and accessing the .keys() method. For example, census.columns.keys() would return a list of column names of the census table.

Following this, we can use the metadata container to find out more details about the reflected table such as the columns and their types. For example, information about the table objects are stored in the metadata.tables dictionary, so you can get the metadata of your census table with metadata.tables['census']. This is similar to your use of the repr() function on the census table from the previous exercise.

The code for connecting to the engine and initializing the metadata you wrote in the previous exercises is displayed for you again and for the last time. From now on and until Chapter 5, this will usually be done behind the scenes.

In [72]:
"""Print a list of column names of the census table by applying the .keys() method to census.columns."""

# Print the column names
print(census.columns.keys())


"""Print the details of the census table using the metadata.tables dictionary along with the repr() function. 
To do this, first access the 'census' key of the metadata.tables dictionary, and place this inside the provided repr() function."""

# Print full metadata of census
print(repr(metadata.tables['census']))

['state', 'sex', 'age', 'pop2000', 'pop2008']
Table('census', MetaData(), Column('state', VARCHAR(length=30), table=<census>), Column('sex', VARCHAR(length=1), table=<census>), Column('age', INTEGER(), table=<census>), Column('pop2000', INTEGER(), table=<census>), Column('pop2008', INTEGER(), table=<census>), schema=None)


## Selecting data from a Table: raw SQL

In [78]:
# Build select statement for census table using text() to make it executable
stmt = text('SELECT * FROM census')
#stmt = text('SELECT * FROM state_fact')  # Try 'state_fact' or another table from the list

# Execute the statement and fetch the results
results = connection.execute(stmt).fetchall()

# Print results
print(results)

# Close the connection
connection.close()

InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

[SQL: SELECT * FROM census]
(Background on this error at: https://sqlalche.me/e/20/2j85)

## Calculating a difference between two columns

Often, you'll need to perform math operations as part of a query, such as if you wanted to calculate the change in population from 2000 to 2008. For math operations on numbers, the operators in SQLAlchemy work the same way as they do in Python.

You can use these operators to perform addition (+), subtraction (-), multiplication (*), division (/), and modulus (%) operations. Note: They behave differently when used with non-numeric column types.

Let's now find the top 5 states by population growth between 2000 and 2008.

Instructions
Define a select statement called stmt to return:
*  The state column of the census table (census.columns.state).
*  The difference in population count between 2008 (census.columns.pop2008) and 2000 (census.columns.pop2000) labeled as 'pop_change'.
* Group the statement by census.columns.state.
* Order the statement by population change ('pop_change') in descending order. Do so by passing it desc('pop_change').
* Use the .limit() method on the previous statement to return only 5 records.
* Execute the statement and fetchall() the records.
* The print statement has already been written for you. Submit the answer to view the results!


In [59]:
# Build query to return state names by population difference from 2008 to 2000: stmt
stmt = select([census.columns.state, (census.columns.pop2008 - census.columns.pop2000).label("pop_change")])

ArgumentError: Column expression, FROM clause, or other columns clause element expected, got [Column('state', VARCHAR(length=30), table=<census>), <sqlalchemy.sql.elements.Label at 0x23235d5f380; pop_change>]. Did you mean to say select(Column('state', VARCHAR(length=30), table=<census>), <sqlalchemy.sql.elements.Label at 0x23235d5f380; pop_change>)?

In [None]:
# Append group by for the state: stmt_grouped
stmt_grouped = stmt.group_by(census.columns.state)

# Append order by for pop_change descendingly: stmt_ordered
stmt_ordered = stmt_grouped.order_by(desc('pop_change'))

# Return only 5 results: stmt_top5
stmt_top5 = stmt_ordered.limit(5)

# Use connection to execute stmt_top5 and fetch all results
results = connection.execute(stmt_top5).fetchall()

# Print the state and population change for each record
for result in results:
    print('{}:{}'.format(result.state, result.pop_change))

ArgumentError: Column expression, FROM clause, or other columns clause element expected, got [Column('state', VARCHAR(length=30), table=<census>), <sqlalchemy.sql.elements.Label at 0x23235def7a0; pop_change>]. Did you mean to say select(Column('state', VARCHAR(length=30), table=<census>), <sqlalchemy.sql.elements.Label at 0x23235def7a0; pop_change>)?