## Introduction to Importing Data in Python



## Course Description

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

##  Introduction and flat files
Free
0%

In this chapter, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.

    Welcome to the course!    50 xp
    Exploring your working directory    50 xp
    Importing entire text files    100 xp
    Importing text files line by line    100 xp
    The importance of flat files in data science    50 xp
    Pop quiz: examples of flat files    50 xp
    Pop quiz: what exactly are flat files?    50 xp
    Why we like flat files and the Zen of Python    50 xp
    Importing flat files using NumPy    50 xp
    Using NumPy to import flat files    100 xp
    Customizing your NumPy import    100 xp
    Importing different datatypes    100 xp
    Working with mixed datatypes (1)    50 xp
    Working with mixed datatypes (2)    100 xp
    Importing flat files using pandas    50 xp
    Using pandas to import flat files as DataFrames (1)    100 xp
    Using pandas to import flat files as DataFrames (2)    100 xp
    Customizing your pandas import    100 xp
    Final thoughts on data import    50 xp


##  Importing data from other file types
0%

You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.

    Introduction to other file types    50 xp
    Not so flat any more    50 xp
    Loading a pickled file    100 xp
    Listing sheets in Excel files    100 xp
    Importing sheets from Excel files    100 xp
    Customizing your spreadsheet import    100 xp
    Importing SAS/Stata files using pandas    50 xp
    How to import SAS7BDAT    50 xp
    Importing SAS files    100 xp
    Using read_stata to import Stata files    50 xp
    Importing Stata files    100 xp
    Importing HDF5 files    50 xp
    Using File to import HDF5 files    50 xp
    Using h5py to import HDF5 files    100 xp
    Extracting data from your HDF5 file    100 xp
    Importing MATLAB files    50 xp
    Loading .mat files    100 xp
    The structure of .mat in Python    100 xp


##  Working with relational databases in Python
0%

In this chapter, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.

    Introduction to relational databases    50 xp
    Pop quiz: The relational model    50 xp
    Creating a database engine in Python    50 xp
    Creating a database engine    100 xp
    What are the tables in the database?    100 xp
    Querying relational databases in Python    50 xp
    The Hello World of SQL Queries!    100 xp
    Customizing the Hello World of SQL Queries    100 xp
    Filtering your database records using SQL's WHERE    100 xp
    Ordering your SQL records with ORDER BY    100 xp
    Querying relational databases directly with pandas    50 xp
    Pandas and The Hello World of SQL Queries!    100 xp
    Pandas for more complex querying    100 xp
    Advanced querying: exploiting table relationships    50 xp
    The power of SQL lies in relationships between tables: INNER JOIN    100 xp
    Filtering your INNER JOIN    100 xp
    Final Thoughts    50 xp


## Welcome to the course!







**Welcome to the first course on Importing Data in Python!  

The instructor is Hugo Bowne-Anderson, a Data Scientist at DataCamp.  


# In this course, you'll learn how to import data from a large variety of import data sources, for example: (1) flat files such as .txt and .csv;  (2) files native to other software such as Excel spreadsheet, Stata, SAS and MATLAB files;  (3) relational databases such as SQLite & MySQL, PostgreSQL.  We'll cover all of these topics in this course.  

# *******************************************************************************************************************
# *******************************************************************************************************************
First of, we're going to learn how to import basic text files, which we can broadly classify into types of files - those containing plain text, such as the opening of Mark Twain's novel The Adventures of Huckleberry Finn, and those containing records, that is, table data, such as titanic.csv, in which each row is s unique passenger onboard and each column is a characteristic or feature, such as sex, carbin and survived or not.  The latter is know as a flat file and we'll come back to these in a minute.  


In this section, we'll figure out how to read lines from a plain text file: Lets do it.  

# *******************************************************************************************************************
To check out any plain text file, you can use Python's basic "open()" function to open a connection to the file.  To do so, you assign the filename to a variable as a string, pass the filename to the function "open()" and also pass it the argument "mode='r'", which makes sure that we can only read it (we wouldn'twant to accidentally write to it), assign the text form the file to a variable "text" by applying the method ".read()" to the connection to the file.  After you do this, make sure that you close the connection to the file using the command "file.close()".  
# Its always best practice to clean while cooking!  
You can then print the file to console and check  it out using the command "print(text)".  A brief side note: if you wanted to open a file in order to write to it, you would pass it the argument "mode='w'".  We won't use that in this course as this is course on Importing Data but it is good to know.  

# *******************************************************************************************************************
You can avoid having to "file.close()" the conncetionn to the file by using a with statement.  This allows you to create a context in which you can execute commands with the file open.  Once out of this clouse/context, the file is no longer open and, for this reason, with is called a Context Manager.  


What you are doing here is called "binding" a variable in the context manager construct; while still within this construct, the variable file will be bound to "open(filename, 'r')".  It is best practice to use the with statement as you never have to concern yourself with closing the file again.  

In the following interactive coding sessions, you'll figure out how to print file content to console.  You'll also learn to print specific lines, which can be very useful for large files.  Then we'll be back to discuss flat files and then I'll show you how to use Python package NumPy to make our job of importing flat files & numerical data a far easier beast to tame.  



filename = '1984.txt'
file = open(filename, mode='r')

text = file.read()

file.close()


with open('1984.txt', 'r') as file:
    print(file.read())

# By the way, can I recall the Hugo tole us hoe to read huge file line-by-line into our memory, also Pandas read_csv() fucntion provide same functionality.  I think we can set the program the read in huge txt file 1000 line per iterate or something.  


# *******************************************************************************************************************

# In Pandas.read_csv(), we have "chunksize=" arg to help us iterate chunk by chunk
# In file operation, we can use for line in file, XXXXX


Dont just finished the course, keep thinking 

In [11]:
filename = 'alice_in_wonderland.txt'
file = open(filename, mode='r')

text = file.read()

file.close()

print(text[:590])




with open('alice_in_wonderland.txt', 'r') as file:
    text = file.read()
    print(text[:590])



#####################################################################################################################
abc = []
with open('alice_in_wonderland.txt', 'r') as file:
    for line in file:
        abc.append(line)
        
print(abc[:5])

Alice's Adventures in Wonderland

                ALICE'S ADVENTURES IN WONDERLAND

                          Lewis Carroll

               THE MILLENNIUM FULCRUM EDITION 3.0




                            CHAPTER I

                      Down the Rabbit-Hole


  Alice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do:  once or twice she had
peeped into the book her sister was reading, but it had no
pictures or conversations in it, `and what is the use of a book,'
thought Alice `without pictures or conversation?'

  So she was consider
Alice's Adventures in Wonderland

                ALICE'S ADVENTURES IN WONDERLAND

                          Lewis Carroll

               THE MILLENNIUM FULCRUM EDITION 3.0




                            CHAPTER I

                      Down the Rabbit-Hole


  Alice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do:  once or twice she had
peeped into the 

## Exploring your working directory

In order to import data into Python, you should first have an idea of what files are in your working directory.

IPython, which is running on DataCamp's servers, has a bunch of cool commands, including its magic commands. For example, starting a line with ! gives you complete system shell access. This means that the IPython magic command ! ls will display the contents of your current directory. Your task is to use the IPython magic command ! ls to check out the contents of your current directory and answer the following question: which of the following files is in your working directory?
Instructions
50 XP
Possible Answers

    huck_finn.txt
    titanic.csv
    moby_dick.txt
    

In [None]:
In [1]:
ls
moby_dick.txt
In [2]:
! ls
moby_dick.txt


## Importing entire text files

In this exercise, you'll be working with the file moby_dick.txt. It is a text file that contains the opening sentences of Moby Dick, one of the great American novels! Here you'll get experience opening a text file, printing its contents to the shell and, finally, closing it.
Instructions
100 XP

    Open the file moby_dick.txt as read-only and store it in the variable file. Make sure to pass the filename enclosed in quotation marks ''.
    Print the contents of the file to the shell using the print() function. As Hugo showed in the video, you'll need to apply the method read() to the object file.
    Check whether the file is closed by executing print(file.closed).
    Close the file using the close() method.
    Check again that the file is closed as you did above.
