# Exploring your working directory
In order to import data into Python, you should first have an idea of what files are in your working directory.

IPython, which is running on DataCamp's servers, has a bunch of cool commands, including its magic commands. For example, starting a line with ! gives you complete system shell access. This means that the IPython magic command ! ls will display the contents of your current directory. Your task is to use the IPython magic command ! ls to check out the contents of your current directory and answer the following question: which of the following files is in your working directory?

In [1]:
"""Possible Answers

huck_finn.txt

titanic.csv

moby_dick.txt"""

#Answer= moby_dick.txt

'Possible Answers\n\nhuck_finn.txt\n\ntitanic.csv\n\nmoby_dick.txt'

# Importing entire text files
In this exercise, you'll be working with the file moby_dick.txt. It is a text file that contains the opening sentences of Moby Dick, one of the great American novels! Here you'll get experience opening a text file, printing its contents to the shell and, finally, closing it.

In [None]:
# Open a file: file
file = open('moby_dick.txt' ,mode= 'r') # mode = 'r' for read only

# Print it
print(file.read()) 

# Check whether file is closed
print(file.closed)

# Close file
file.close()

# Check whether file is closed
print(file.closed)

# Importing text files line by line
For large files, we may not want to print all of their content to the shell: you may wish to print only the first few lines. Enter the readline() method, which allows you to do this. When a file called file is open, you can print out the first line by executing file.readline(). If you execute the same command again, the second line will print, and so on.

In the introductory video, Hugo also introduced the concept of a context manager. He showed that you can bind a variable file by using a context manager construct:

with open('huck_finn.txt') as file:
While still within this construct, the variable file will be bound to open('huck_finn.txt'); thus, to print the file to the shell, all the code you need to execute is:

with open('huck_finn.txt') as file:
    print(file.readline())
You'll now use these tools to print the first few lines of moby_dick.txt!

In [None]:
"""Open moby_dick.txt using the with context manager and the variable file.
Print the first three lines of the file to the shell by using readline() three times within 
the context manager.
"""
# Read & print the first 3 lines
with open('moby_dick.txt') as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

In [2]:
"""Pop quiz: examples of flat files
You're now well-versed in importing text files and you're about to become a wiz at importing flat files. But can you remember exactly what a flat file is? Test your knowledge by answering the following question: which of these file types below is NOT an example of a flat file?

Answer the question
50XP
Possible Answers

A .csv file.

A tab-delimited .txt.

A relational database (e.g. PostgreSQL)."""
#answer= relational database (e.g. PostgreSQL).

"Pop quiz: examples of flat files\nYou're now well-versed in importing text files and you're about to become a wiz at importing flat files. But can you remember exactly what a flat file is? Test your knowledge by answering the following question: which of these file types below is NOT an example of a flat file?\n\nAnswer the question\n50XP\nPossible Answers\n\nA .csv file.\n\nA tab-delimited .txt.\n\nA relational database (e.g. PostgreSQL)."

In [3]:
"""Pop quiz: what exactly are flat files?
Which of the following statements about flat files is incorrect?

Answer the question
50XP
Possible Answers

Flat files consist of rows and each row is called a record.

Flat files consist of multiple tables with structured relationships between the tables.

A record in a flat file is composed of fields or attributes, each of which contains at most one item of information.

Flat files are pervasive in data science."""
#answer=Flat files consist of multiple tables with structured relationships 
#between the tables.

'Pop quiz: what exactly are flat files?\nWhich of the following statements about flat files is incorrect?\n\nAnswer the question\n50XP\nPossible Answers\n\nFlat files consist of rows and each row is called a record.\n\nFlat files consist of multiple tables with structured relationships between the tables.\n\nA record in a flat file is composed of fields or attributes, each of which contains at most one item of information.\n\nFlat files are pervasive in data science.'

In [4]:
"""Why we like flat files and the Zen of Python
In PythonLand, there are currently hundreds of Python Enhancement Proposals, commonly referred to as PEPs. PEP8, for example, is a standard style guide for Python, written by our sensei Guido van Rossum himself. It is the basis for how we here at DataCamp ask our instructors to style their code. Another one of my favorites is PEP20, commonly called the Zen of Python. Its abstract is as follows:

Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.

If you don't know what the acronym BDFL stands for, I suggest that you look here. You can print the Zen of Python in your shell by typing import this into it! You're going to do this now and the 5th aphorism (line) will say something of particular interest.

The question you need to answer is: what is the 5th aphorism of the Zen of Python?

Instructions
50 XP
Possible Answers

Flat is better than nested.

Flat files are essential for data science.

The world is representable as a flat file.

Flatness is in the eye of the beholder."""
#answer=Flat is better than nested

"Why we like flat files and the Zen of Python\nIn PythonLand, there are currently hundreds of Python Enhancement Proposals, commonly referred to as PEPs. PEP8, for example, is a standard style guide for Python, written by our sensei Guido van Rossum himself. It is the basis for how we here at DataCamp ask our instructors to style their code. Another one of my favorites is PEP20, commonly called the Zen of Python. Its abstract is as follows:\n\nLong time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.\n\nIf you don't know what the acronym BDFL stands for, I suggest that you look here. You can print the Zen of Python in your shell by typing import this into it! You're going to do this now and the 5th aphorism (line) will say something of particular interest.\n\nThe question you need to answer is: what is the 5th aphorism of the Zen of Python?\n\nInstructions\n50 XP\nPossible Answers\n\n

# Using NumPy to import flat files
In this exercise, you're now going to load the MNIST digit recognition dataset using the numpy function loadtxt() and see just how easy it can be:

The first argument will be the filename.
The second will be the delimiter which, in this case, is a comma.
You can find more information about the MNIST dataset here on the webpage of Yann LeCun, who is currently Director of AI Research at Facebook and Founding Director of the NYU Center for Data Science, among many other things.

In [None]:
# Import package
import numpy as np

# Assign filename to variable: file
file = 'digits.csv'

# Load file as array: digits
digits = np.loadtxt(file, delimiter=',')

# Print datatype of digits
print(type(digits))

# Select and reshape a row
im = digits[21, 1:]
im_sq = np.reshape(im, (28, 28))

# Plot reshaped data (matplotlib.pyplot already loaded as plt)
plt.imshow(im_sq, cmap='Greys', interpolation='nearest')
plt.show()

# Customizing your NumPy import
What if there are rows, such as a header, that you don't want to import? What if your file has a delimiter other than a comma? What if you only wish to import particular columns?

There are a number of arguments that np.loadtxt() takes that you'll find useful:

delimiter changes the delimiter that loadtxt() is expecting.
- You can use ',' for comma-delimited.
- You can use '\t' for tab-delimited.
- skiprows allows you to specify how many rows (not indices) you wish to skip
usecols takes a list of the indices of the columns you wish to keep.

In [None]:
# Import numpy
import numpy as np

# Assign the filename: file
file = 'digits_header.txt'

# Load the data: data
data = np.loadtxt(file, delimiter='\t', skiprows=1 , usecols={0,2})

# Print data
print(data)

# Importing different datatypes
The file seaslug.txt

- has a text header, consisting of strings
- is tab-delimited.
These data consists of percentage of sea slug larvae that had metamorphosed in a given time period. 
Read more here.

Due to the header, if you tried to import it as-is using np.loadtxt(), Python would throw you a ValueError and tell you that it could not convert string to float. There are two ways to deal with this: firstly, you can set the data type argument dtype equal to str (for string).

Alternatively, you can skip the first row as we have seen before, using the skiprows argument.

In [None]:
import numpy as np

# Assign filename: file
file = 'seaslug.txt'

# Import file: data
data = np.loadtxt(file, delimiter='\t', dtype=str)

# Print the first element of data
print(data[0])

# Import data as floats and skip the first row: data_float
data_float = np.loadtxt(file, delimiter='\t', dtype=float, skiprows=1)

# Print the 10th element of data_float
print(data_float[9])

# Plot a scatterplot of the data
plt.scatter(data_float[:, 0], data_float[:, 1])
plt.xlabel('time (min.)')
plt.ylabel('percentage of larvae')
plt.show()

# Working with mixed datatypes (1)
Much of the time you will need to import datasets which have different datatypes in different columns; one column may contain strings and another floats, for example. The function np.loadtxt() will freak at this. There is another function, np.genfromtxt(), which can handle such structures. If we pass dtype=None to it, it will figure out what types each column should be.

Import 'titanic.csv' using the function np.genfromtxt() as follows:

data = np.genfromtxt('titanic.csv', delimiter=',', names=True, dtype=None)
Here, the first argument is the filename, the second specifies the delimiter , and the third argument names tells us there is a header. Because the data are of different types, data is an object called a structured array. Because numpy arrays have to contain elements that are all the same type, the structured array solves this by being a 1D array, where each element of the array is a row of the flat file imported. You can test this by checking out the array's shape in the shell by executing np.shape(data).

Accessing rows and columns of structured arrays is super-intuitive: to get the ith row, merely execute data[i] and to get the column with name 'Fare', execute data['Fare'].

After importing the Titanic data as a structured array (as per the instructions above), print the entire column with the name Survived to the shell. What are the last 4 values of this column?

In [5]:
"""Possible Answers
1,0,0,1.
1,2,0,0.
1,0,1,0.
0,1,1,1.
"""
#answer=1010

'Possible Answers\n1,0,0,1.\n1,2,0,0.\n1,0,1,0.\n0,1,1,1.\n'

# Working with mixed datatypes (2)
You have just used np.genfromtxt() to import data containing mixed datatypes. There is also another function np.recfromcsv() that behaves similarly to np.genfromtxt(), except that its default dtype is None. In this exercise, you'll practice using this to achieve the same result.

In [None]:
# Assign the filename: file
file = 'titanic.csv'

# Import file using np.recfromcsv: d
d= np.recfromcsv(file, delimiter = ',', names=True, dtype=None)

# Print out first three entries of d
print(d[:3])

# Using pandas to import flat files as DataFrames (1)
In the last exercise, you were able to import flat files containing columns with different datatypes as numpy arrays. However, the DataFrame object in pandas is a more appropriate structure in which to store such data and, thankfully, we can easily import files of mixed data types as DataFrames using the pandas functions read_csv() and read_table().

In [None]:
# Import pandas as pd
import pandas as pd

# Assign the filename: file
file = 'titanic.csv'

# Read the file into a DataFrame: df
df = pd.read_csv(file)

# View the head of the DataFrame
print(df.head())

# Using pandas to import flat files as DataFrames (2)
In the last exercise, you were able to import flat files into a pandas DataFrame. As a bonus, it is then straightforward to retrieve the corresponding numpy array using the attribute values. You'll now have a chance to do this using the MNIST dataset, which is available as digits.csv.

In [None]:
# Assign the filename: file
file = 'digits.csv'

# Read the first 5 rows of the file into a DataFrame: data
data = pd.read_csv(file, nrows=5 , header= None)

# Build a numpy array from the DataFrame: data_array
data_array = np.array(data)

# Print the datatype of data_array to the shell
print(type(data_array))

# Customizing your pandas import
The pandas package is also great at dealing with many of the issues you will encounter when importing data as a data scientist, such as comments occurring in flat files, empty lines and missing values. Note that missing values are also commonly referred to as NA or NaN. To wrap up this chapter, you're now going to import a slightly corrupted copy of the Titanic dataset titanic_corrupt.txt, which

- contains comments after the character '#'
- is tab-delimited.

In [None]:
# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

# Assign filename: file
file = 'titanic_corrupt.txt'

# Import file: data
data = pd.read_csv(file, sep='\t' , comment='#', na_values='Nothing')

# Print the head of the DataFrame
print(data.head())

# Plot 'Age' variable in a histogram
pd.DataFrame.hist(data[['Age']])
plt.xlabel('Age (years)')
plt.ylabel('count')
plt.show()

""" Not so flat any more
In Chapter 1, you learned how to use the IPython magic command ! ls to 
explore your current working directory. You can also do this natively in 
Python using the library os, which consists of miscellaneous operating 
system interfaces.
The first line of the following code imports the library os, the second 
line stores the name of the current directory in a string called wd and 
the third outputs the contents of the directory in a list to the shell.
"""
import os
wd = os.getcwd()
os.listdir(wd)

"""
Run this code in the IPython shell and answer the following questions. 
Ignore the files that begin with ..
Check out the contents of your current directory and answer the following 
questions: (1) which file is in your directory and NOT an example of a 
flat file; (2) why is it not a flat file?
Instructions
Possible Answers
database.db is not a flat file because relational databases contain 
structured relationships and flat files do not.
battledeath.xlsx is not a flat because it is a spreadsheet consisting of 
many sheets, not a single table.
titanic.txt is not a flat file because it is a .txt, not a .csv.
"""

answer= battledeath.xlsx is not a flat because it is a spreadsheet consisting of 
many sheets, not a single table.

# Loading a pickled file
There are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries. If you want your files to be human readable, you may want to save them as text files in a clever manner. JSONs, which you will see in a later chapter, are appropriate for Python dictionaries.

However, if you merely want to be able to import them into Python, you can serialize them. All this means is converting the object into a sequence of bytes, or a bytestream.

In this exercise, you'll import the pickle package, open a previously pickled data structure from a file and load it.

In [None]:
# Import pickle package
import pickle

# Open pickle file and load data: d
with open('data.pkl', 'rb') as file:
    d = pickle.load(file)

# Print d
print(d)

# Print datatype of d
print(type(d))

# Listing sheets in Excel files
Whether you like it or not, any working data scientist will need to deal with Excel spreadsheets at some point in time. You won't always want to do so in Excel, however!

Here, you'll learn how to use pandas to import Excel spreadsheets and how to list the names of the sheets in any loaded .xlsx file.

Recall from the video that, given an Excel file imported into a variable spreadsheet, you can retrieve a list of the sheet names using the attribute spreadsheet.sheet_names.

Specifically, you'll be loading and checking out the spreadsheet 'battledeath.xlsx', modified from the Peace Research Institute Oslo's (PRIO) dataset. This data contains age-adjusted mortality rates due to war in various countries over several years.

In [None]:
# Import pandas
import pandas as pd

# Assign spreadsheet filename: file
file = 'battledeath.xlsx'

# Load spreadsheet: xls
xls = pd.ExcelFile(file)

# Print sheet names
print(xls.sheet_names)

# Importing sheets from Excel files
In the previous exercises, you saw that the Excel file contains two sheets, '2002' and '2004'. The next step is to import these.

In this exercise, you'll learn how to import any given sheet of your loaded .xlsx file as a DataFrame. You'll be able to do so by specifying either the sheet's name or its index.

The spreadsheet 'battledeath.xlsx' is already loaded as xls.

In [None]:
# Load a sheet into a DataFrame by name: df1
df1 = xls.parse('2004')

# Print the head of the DataFrame df1
print(df1.head())

# Load a sheet into a DataFrame by index: df2
df2 = xls.parse(0)

# Print the head of the DataFrame df2
print(df2.head())

# Customizing your spreadsheet import
Here, you'll parse your spreadsheets and use additional arguments to skip rows, rename columns and select only particular columns.

The spreadsheet 'battledeath.xlsx' is already loaded as xls.

As before, you'll use the method parse(). This time, however, you'll add the additional arguments skiprows, names and usecols. These skip rows, name the columns and designate which columns to parse, respectively. All these arguments can be assigned to lists containing the specific row numbers, strings and column numbers, as appropriate.

In [None]:
# Parse the first sheet and rename the columns: df1
df1 = xls.parse(0, skiprows=[0], names=['Country','AAM due to War (2002)'])

# Print the head of the DataFrame df1
print(df1.head())

# Parse the first column of the second sheet and rename the column: df2
df2 = xls.parse(1, usecols= [0] , skiprows=[0], names=['Country'])

# Print the head of the DataFrame df2
print(df2.head())

In [6]:
"""How to import SAS7BDAT
How do you correctly import the function SAS7BDAT() from the package sas7bdat?

Answer the question
50XP
Possible Answers

import SAS7BDAT from sas7bdat


from SAS7BDAT import sas7bdat


import sas7bdat from SAS7BDAT

from sas7bdat import SAS7BDAT"""
#answer=from sas7bdat import SAS7BDAT

'How to import SAS7BDAT\nHow do you correctly import the function SAS7BDAT() from the package sas7bdat?\n\nAnswer the question\n50XP\nPossible Answers\n\nimport SAS7BDAT from sas7bdat\n\n\nfrom SAS7BDAT import sas7bdat\n\n\nimport sas7bdat from SAS7BDAT\n\nfrom sas7bdat import SAS7BDAT'

# Importing SAS files
In this exercise, you'll figure out how to import a SAS file as a DataFrame using SAS7BDAT and pandas. The file 'sales.sas7bdat' is already in your working directory and both pandas and matplotlib.pyplot have already been imported as follows:

import pandas as pd

import matplotlib.pyplot as plt

The data are adapted from the website of the undergraduate text book Principles of Econometrics by Hill, Griffiths and Lim.

In [None]:
# Import sas7bdat package
from sas7bdat import SAS7BDAT
# Save file to a DataFrame: df_sas
with SAS7BDAT('sales.sas7bdat') as file:
    df_sas = file.to_data_frame()

# Print head of DataFrame
print(df_sas.head())

# Plot histogram of DataFrame features (pandas and pyplot already imported)
pd.DataFrame.hist(df_sas[['P']])
plt.ylabel('count')
plt.show()

# Using read_stata to import Stata files
The pandas package has been imported in the environment as pd and the file disarea.dta is in your working directory. The data consist of disease extents for several diseases in various countries (more information can be found here).

What is the correct way of using the read_stata() function to import disarea.dta into the object df?

In [7]:
"""Possible Answers

df = 'disarea.dta'

df = read_stata.pd('disarea.dta')

df = pd.read_stata('disarea.dta')

df = pd.read_stata(disarea.dta)"""
#answer=df = pd.read_stata('disarea.dta')

"Possible Answers\n\ndf = 'disarea.dta'\n\ndf = read_stata.pd('disarea.dta')\n\ndf = pd.read_stata('disarea.dta')\n\ndf = pd.read_stata(disarea.dta)"

# Importing Stata files
Here, you'll gain expertise in importing Stata files as DataFrames using the pd.read_stata() function from pandas. The last exercise's file, 'disarea.dta', is still in your working directory.

In [None]:
# Import pandas
import pandas as pd

# Load Stata file into a pandas DataFrame: df
df = pd.read_stata("disarea.dta")

# Print the head of the DataFrame df
print(df.head())

# Plot histogram of one column of the DataFrame
pd.DataFrame.hist(df[['disa10']])
plt.xlabel('Extent of disease')
plt.ylabel('Number of countries')
plt.show()

# Using File to import HDF5 files
The h5py package has been imported in the environment and the file LIGO_data.hdf5 is loaded in the object h5py_file.

What is the correct way of using the h5py function, File(), to import the file in h5py_file into an object, h5py_data, for reading only?

In [8]:
"""Possible Answers

h5py_data = File(h5py_file, 'r')

h5py_data = h5py.File(h5py_file, 'r')

h5py_data = h5py.File(h5py_file, read)

h5py_data = h5py.File(h5py_file, 'read')"""
#answer=h5py_data = h5py.File(h5py_file, 'r')

"Possible Answers\n\nh5py_data = File(h5py_file, 'r')\n\nh5py_data = h5py.File(h5py_file, 'r')\n\nh5py_data = h5py.File(h5py_file, read)\n\nh5py_data = h5py.File(h5py_file, 'read')"

# Using h5py to import HDF5 files
The file 'LIGO_data.hdf5' is already in your working directory. In this exercise, you'll import it using the h5py library. You'll also print out its datatype to confirm you have imported it correctly. You'll then study the structure of the file in order to see precisely what HDF groups it contains.

You can find the LIGO data plus loads of documentation and tutorials here. There is also a great tutorial on Signal Processing with the data here.

In [None]:
# Import packages
import numpy as np
import h5py

# Assign filename: file
file = "LIGO_data.hdf5"

# Load file: data
data = h5py.File(file, "r")

# Print the datatype of the loaded file
print(type(data))

# Print the keys of the file
for key in data.keys():
    print(key)

# Extracting data from your HDF5 file
In this exercise, you'll extract some of the LIGO experiment's actual data from the HDF5 file and you'll visualize it.

To do so, you'll need to first explore the HDF5 group 'strain'.

In [None]:
# Get the HDF5 group: group
group = data["strain"]

# Check out keys of group
for key in group.keys():
    print(key)

# Set variable equal to time series data: strain
strain = group['Strain'].value

# Set number of time points to sample: num_samples
num_samples = 10000

# Set time vector
time = np.arange(0, 1, 1/num_samples)

# Plot data
plt.plot(time, strain[:num_samples])
plt.xlabel('GPS Time (s)')
plt.ylabel('strain')
plt.show()

# Loading .mat files
In this exercise, you'll figure out how to load a MATLAB file using scipy.io.loadmat() and you'll discover what Python datatype it yields.

The file 'albeck_gene_expression.mat' is in your working directory. This file contains gene expression data from the Albeck Lab at UC Davis. You can find the data and some great documentation here.

In [None]:
# Import package
import scipy.io

# Load MATLAB file: mat
mat = scipy.io.loadmat("albeck_gene_expression.mat")

# Print the datatype type of mat
print(type(mat))

In [9]:
"""The structure of .mat in Python
Here, you'll discover what is in the MATLAB dictionary that you loaded in the previous exercise.

The file 'albeck_gene_expression.mat' is already loaded into the variable mat. The following libraries have already been imported as follows:

import scipy.io
import matplotlib.pyplot as plt
import numpy as np

Once again, this file contains gene expression data from the Albeck Lab at UCDavis. You can find the data and some great documentation here."""

"The structure of .mat in Python\nHere, you'll discover what is in the MATLAB dictionary that you loaded in the previous exercise.\n\nThe file 'albeck_gene_expression.mat' is already loaded into the variable mat. The following libraries have already been imported as follows:\n\nimport scipy.io\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nOnce again, this file contains gene expression data from the Albeck Lab at UCDavis. You can find the data and some great documentation here."

In [None]:
# Print the keys of the MATLAB dictionary
print(mat.keys())

# Print the type of the value corresponding to the key 'CYratioCyt'
print(type(mat['CYratioCyt']))

# Print the shape of the value corresponding to the key 'CYratioCyt'
print(np.shape(mat['CYratioCyt']))

# Subset the array and plot it
data = mat['CYratioCyt'][25, 5:]
fig = plt.figure()
plt.plot(data)
plt.xlabel('time (min.)')
plt.ylabel('normalized fluorescence (measure of expression)')
plt.show()

In [10]:
"""Pop quiz: The relational model
Which of the following is not part of the relational model?

Answer the question
50XP
Possible Answers

Each row or record in a table represents an instance of an entity type.

Each column in a table represents an attribute or feature of an instance.

Every table contains a primary key column, which has a unique entry for each row.

A database consists of at least 3 tables.

There are relations between tables."""
#answer=A database consists of at least 3 tables

'Pop quiz: The relational model\nWhich of the following is not part of the relational model?\n\nAnswer the question\n50XP\nPossible Answers\n\nEach row or record in a table represents an instance of an entity type.\n\nEach column in a table represents an attribute or feature of an instance.\n\nEvery table contains a primary key column, which has a unique entry for each row.\n\nA database consists of at least 3 tables.\n\nThere are relations between tables.'

# Creating a database engine
Here, you're going to fire up your very first SQL engine. You'll create an engine to connect to the SQLite database 'Chinook.sqlite', which is in your working directory. Remember that to create an engine to connect to 'Northwind.sqlite', Hugo executed the command

engine = create_engine('sqlite:///Northwind.sqlite')
Here, 'sqlite:///Northwind.sqlite' is called the connection string to the SQLite database Northwind.sqlite. A little bit of background on the Chinook database: the Chinook database contains information about a semi-fictional digital media store in which media data is real and customer, employee and sales data has been manually created.

Why the name Chinook, you ask? According to their website,

The name of this sample database was based on the Northwind database. Chinooks are winds in the interior West of North America, where the Canadian Prairies and Great Plains meet various mountain ranges. Chinooks are most prevalent over southern Alberta in Canada. Chinook is a good name choice for a database that intends to be an alternative to Northwind.

In [None]:
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# What are the tables in the database?
In this exercise, you'll once again create an engine to connect to 'Chinook.sqlite'. Before you can get any data out of the database, however, you'll need to know what tables it contains!

To this end, you'll save the table names to a list using the method table_names() on the engine and then you will print the list.

In [None]:
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Save the table names to a list: table_names
table_names = engine.table_names()

# Print the table names to the shell
print(table_names)

The Hello World of SQL Queries!
Now, it's time for liftoff! In this exercise, you'll perform the Hello World of SQL queries, SELECT, in order to retrieve all columns of the table Album in the Chinook database. Recall that the query SELECT * selects all columns.

Instructions
100 XP

- Open the engine connection as con using the method connect() on the engine.
- Execute the query that selects ALL columns from the Album table. Store the results in rs.
- Store all of your query results in the DataFrame df by applying the fetchall() method to the results rs.
- Close the connection!

In [None]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine connection: con
con = engine.connect()

# Perform query: rs
rs = con.execute("SELECT * FROM Album")

# Save results of the query to DataFrame: df
df = pd.DataFrame(rs.fetchall())

# Close connection
con.close()

# Print head of DataFrame df
print(df.head())