# **DataCamp.Course_048_Introduction to Importing Data in Python**

### **Course Description**

As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.

## **Introduction and flat files (Module 01-048)**

#### **Welcome to the course!**

1. Import data

- Flat files, e.g. .txts, .csvs
- Files from other software
- Relational databases

- Flat file
ej:     Plain text files
        Table data: titanic.csv

2. Reading a text file
filename = 'huck_finn.txt'
file = open(filename, mode='r') # 'r' is to read
text = file.read()
file.close()

3. Printing a text file
print(text)

4. Writing to a file
filename = 'huck_finn.txt'
file = open(filename, mode='w') # 'w' is to write
file.close()

5. Context manager **with**
with open('huck_finn.txt', 'r') as file:
print(file.read())

6. In the exercises, you’ll:
- Print files to the console
- Print specific lines
- Discuss flat ,les

**Exploring your working directory**

In order to import data into Python, you should first have an idea of what files are in your working directory.

IPython, which is running on DataCamp's servers, has a bunch of cool commands, including its *magic commands*. For example, starting a line with ! gives you complete system shell access. This means that the IPython magic command ! ls will display the contents of your current directory. Your task is to use the IPython magic command ! ls to check out the contents of your current directory and answer the following question: which of the following files is in your working directory?

In [1]:
! ls

'ls' is not recognized as an internal or external command,
operable program or batch file.


**Importing entire text files**

In this exercise, you'll be working with the file moby_dick.txt. It is a text file that contains the opening sentences of Moby Dick, one of the great American novels! Here you'll get experience opening a text file, printing its contents to the shell and, finally, closing it.

STEPS
    Open the file moby_dick.txt as read-only and store it in the variable file. Make sure to pass the filename enclosed in quotation marks ''.
    Print the contents of the file to the shell using the print() function. As Hugo showed in the video, you'll need to apply the method read() to the object file.
    Check whether the file is closed by executing print(file.closed).
    Close the file using the close() method.
    Check again that the file is closed as you did above.


In [6]:
filename = r'G:\My Drive\Data Science\Datacamp_Notebook\Datacamp_Notebook\datasets\moby_dick.txt'

# Open a file: file
file = open(filename, mode='r')

# Print it
print(file.read())

# Check whether file is closed
print(file.closed)

# Close file
file.close()

# Check whether file is closed
print(file.closed)


xx MOBY-DICK 

' " My God ! Mr. Chace, what is the matter ? " I answered, 
" We have been stove by a whale." ! 

Narrative of the Shipwreck of the Whale Ship 
Essex of Nantucket, which was attacked and 
finally destroyed by a large Sperm Whale in 
the Pacific Ocean. By Owen Chace of Nan- 
tucket, first mate of said vessel. New York, 
1821. 

' A mariner sat in the shrouds one night, 

The wind was piping free ; 

Now bright, now dimmed, was the moonlight pale, 
And the phospher gleamed in the wake of the whale, 
As it floundered in the sea.' 

Elizabeth Oakes Smith. 

' The quantity of line withdrawn from the different boats 
engaged in the capture of this one whale, amounted alto- 
gether to 10,440 yards or nearly six English miles. * * * 

t Sometimes the whale shakes its tremendous tail in the 
air, which, cracking like a whip, resounds to the distance of 
three or four miles.' Scoresby. 

1 Mad with the agonies he endures from these fresh attacks, 
the infuriated Sperm Whale rolls 

**Importing text files line by line**

For large files, we may not want to print all of their content to the shell: you may wish to print only the first few lines. Enter the `readline()` method, which allows you to do this. When a file called file is open, you can print out the first line by executing `file.readline()`. If you execute the same command again, the second line will print, and so on.

In the introductory video, Hugo also introduced the concept of a **context manager**. He showed that you can bind a variable `file` by using a context manager construct:

`with open('huck_finn.txt') as file:`

While still within this construct, the variable file will be bound to open('huck_finn.txt'); thus, to print the file to the shell, all the code you need to execute is:

`with open('huck_finn.txt') as file:`
`    print(file.readline())`

You'll now use these tools to print the first few lines of `moby_dick.txt`!

STEPS
    Open moby_dick.txt using the with context manager and the variable file.
    Print the first three lines of the file to the shell by using readline() three times within the context manager.

In [8]:
# Read & print the first 3 lines
with open(filename) as file:
    print(file.readline())
    print(file.readline())
    print(file.readline())

xx MOBY-DICK 



' " My God ! Mr. Chace, what is the matter ? " I answered, 



#### **The importance of flat files in data science**

1. Flat files
titanic.csv

- Text files containing records
- That is, table data
- Record: row of fields or a
- Column: feature or attribute
- Header

2. File extension
.csv - Comma separated values
.txt - Text ,le
commas, tabs - Delimiters

3. Tab-delimited file
MNIST.txt

4. How do you import flat files?
- Two main packages: NumPy, pandas
- Here, you’ll learn to import:
Flat files with numerical data (MNIST)
Flat files with numerical data and strings (titanic.csv)

**Why we like flat files and the Zen of Python**

In PythonLand, there are currently hundreds of Python Enhancement Proposals, commonly referred to as PEPs. PEP8, for example, is a standard style guide for Python, written by our sensei Guido van Rossum himself. It is the basis for how we here at DataCamp ask our instructors to style their code. Another one of my favorites is PEP20, commonly called the Zen of Python. Its abstract is as follows:

    Long time Pythoneer Tim Peters succinctly channels the BDFL's guiding principles for Python's design into 20 aphorisms, only 19 of which have been written down.

If you don't know what the acronym BDFL stands for, I suggest that you look here. You can print the Zen of Python in your shell by typing import this into it! You're going to do this now and the 5th aphorism (line) will say something of particular interest.

The question you need to answer is: what is the 5th aphorism of the Zen of Python?

In [9]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


#### **Importing flat files using NumPy**

1. Why NumPy?

- NumPy arrays: standard for storing numerical data
- Essential for other packages: e.g. scikit-learn
loadtxt()
genfromtxt()

2. Importing flat files using NumPy
import numpy as np
filename = 'MNIST.txt'
data = np.loadtxt(filename, delimiter=',')
data

3. Customizing your NumPy import
import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1)
print(data)

import numpy as np 
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0, 2])
print(data)

data = np.loadtxt(filename, delimiter=',', dtype=str)

4. Mixed datatypes
titanic.csv

-do not work with Numpy

**Using NumPy to import flat files**

In this exercise, you're now going to load the MNIST digit recognition dataset using the numpy function `loadtxt() `and see just how easy it can be:

    The first argument will be the filename.
    The second will be the delimiter which, in this case, is a comma.

You can find more information about the MNIST dataset here on the webpage of Yann LeCun, who is currently Director of AI Research at Facebook and Founding Director of the NYU Center for Data Science, among many other things.

STEPS

    Fill in the arguments of np.loadtxt() by passing file and a comma ',' for the delimiter.
    Fill in the argument of print() to print the type of the object digits. Use the function type().
    Execute the rest of the code to visualize one of the rows of the data.

## **Importing data from other file types (Module 02-048)**

#### **xxx**

1. xxx

2. xxx

3. xxx

4. xxx

## **Working with relational databases in Python (Module 03-048)**

#### **xxx**

1. xxx

2. xxx

3. xxx

4. xxx

In [1]:
print('perrenque!')

perrenque!


#### **xxx**

1. xxx

2. xxx

3. xxx

4. xxx