# Importing Data in Python - Part 1


## Reading a text file

The simplest way to open a text file is by using the open() function.

In [4]:
filename = 'README.md'

file = open(filename, mode='r') # 'r' is to read
text = file.read()
file.close()  # do not forget to close the file
print(text)

# Data-Science-with-Python
Notebooks with notes and examples from the "data scientist with python" carrer track at datacamp.com



You can use a context manager so you don't have to worry about closing the file. The file will be closed when the context is over

In [6]:
filename = 'README.md'

with open(filename, mode='r') as file: # 'r' is to read
    text = file.read()
print(text)

# Data-Science-with-Python
Notebooks with notes and examples from the "data scientist with python" carrer track at datacamp.com



## Magic Commands

https://ipython.readthedocs.io/en/stable/overview.html

IPython has a bunch of magic commands, including the !. If you start a line with ! you have full system access to the shell. 

In [1]:
! ls

1 - Intro to Python for Data Science.ipynb
2 - Intermediate Python for Data Science.ipynb
3 - Data Science Toolbox - Part 1.ipynb
4 - Data Science Toolbox - Part 2.ipynb
5 - Importing Data in Python - Part 1.ipynb
README.md


## Flat Files

Basic text files containing records, like CSV files. By flat files we mean files that contains records (row of fields or attributes). 
Most of the time to import flat files we use NumPy (if all columns are of the same data type) or Pandas. 

### Numpy

We can use the numpy functions **loadtxt()** and **genfromtxt()**

In [None]:
# Import numpy
import numpy as np

# Assign the filename: file
file = 'digits_header.txt'

# Load the data: data
data = np.loadtxt(file, delimiter='\t', skiprows=1, usecols=[1,3])

# Print data
print(data)


### Pandas

Allow two-dimensional labeled data structures. Columns of potentially different types, allows for manipulation, slicing, reshaping, group by, joining, merging, etc.

All we need to do is to call the **read_csv** function from the pandas library.

In [None]:
# Import pandas as pd
import pandas as pd

# Assign the filename: file
file = 'titanic.csv'

# Read the file into a DataFrame: df
df = pd.read_csv(file)

# View the head of the DataFrame
print(df.head())


In [None]:
# Import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

# Assign filename: file
file = 'titanic_corrupt.txt'

# Import file: data
data = pd.read_csv(file, sep="\t", comment='#', na_values=['Nothing'])

# Print the head of the DataFrame
print(data.head())

# Plot 'Age' variable in a histogram
pd.DataFrame.hist(data[['Age']])
plt.xlabel('Age (years)')
plt.ylabel('count')
plt.show()
