# Python Data Analysis

## Outline of the Book

Each numbered part of this book focuses on a particular package or tool that contributes a fundamental piece of the Python data analysis story, and is broken into short self-contained chapters that each discuss a single concept:

- *Part I, Introduction to NumPy*, focuses on the NumPy library, which provides the `ndarray` for efficient storage and manipulation of dense data arrays in Python.
- *Part II, Data Manipulation with Pandas*, introduces the Pandas library, which provides the `DataFrame` for efficient storage and manipulation of labeled/columnar data in Python.
- *Part III, Visualization with Matplotlib*, concentrates on Matplotlib, a library that provides capabilities for a flexible range of data visualizations in Python.

>The PyData world is certainly much larger than these `Three` packages, and is growing every day.
With this in mind, I make every attempt throughout this book to provide references to other interesting efforts, projects, and packages that are pushing the boundaries of what can be done in Python.
Nevertheless, the packages I concentrate on are currently fundamental to much of the work being done in the Python data analysis space, and I expect they will remain important even as the ecosystem continues growing around them.

The chapter of Numpy and Pandas, outlines techniques for effectively loading, storing, and manipulating in-memory data in Python.
The topic is very broad: datasets can come from a wide range of sources and in a wide range of formats, including collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
***Despite this apparent heterogeneity, many datasets can be represented fundamentally as arrays of numbers***.

`For example`, images—particularly digital images—can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.
Sound clips can be thought of as one-dimensional arrays of intensity versus time.
Text can be converted in various ways into numerical representations, such as binary digits representing the frequency of certain words or pairs of words.
No matter what the data is, the first step in making it analyzable will be to transform it into arrays of numbers.

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science.
We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package and the Pandas package.

## Numpy

NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data **buffers**.
In some ways, NumPy arrays are like Python's built-in `list` type, but NumPy arrays provide much more `efficient storage` and `data operations as the arrays grow larger in size`.
NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, **so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you**.

### The keys to understand Numpy
- Python List
- Arrays

### Python List

- Python knows a number of compound data types, used to group together other values. The most versatile is the `list`, which can be written as a list of **comma-separated values** (items) between **square brackets**. Lists might contain items of different types, but usually the items all have the same type. <br>

```
list1 = ['John', 'Paul', 'George', 'Ringo']
```

- Lists are mutable ordered containers of other objects.

In [39]:
squares = [1, 4, 9, 16, 25, 78]
squares

[1, 4, 9, 16, 25, 78]

In [19]:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
letters

['a', 'b', 'c', 'd', 'e', 'f', 'g']

In [41]:
mixed = [1, 4, 9, 16, 25, 78, 'a', 'b', 'c', 'd', 'e', 'f', 'g']
mixed

[1, 4, 9, 16, 25, 78, 'a', 'b', 'c', 'd', 'e', 'f', 'g']

#### Accessing data from a List
- index
- slice

##### Index

In [48]:
squares[0] # character in position 0

1

In [52]:
squares[3] # character in position 3

16

In [49]:
squares[-1] # last character

78

In [50]:
squares[-2] # Seconde last character

25

##### Slice

In [35]:
squares[0:2] # characters from position 0 (included) to 2 (excluded)

[1, 4]

In [44]:
squares[2:5] # characters from position 2 (included) to 5 (excluded)


[9, 16, 25]

In [42]:
squares[4 : ] # characters from position 4 (included) to the end

[25, 78]

In [43]:
squares[-3:] # characters from the third-last (included) to the end

[16, 25, 78]

In [45]:
squares[:]

[1, 4, 9, 16, 25, 78]

In [46]:
squares[::2] # 2 ~ pas an'ilay slice

[1, 9, 25]

In [47]:
squares + [36, 49, 64, 81, 100] # concatenation

[1, 4, 9, 16, 25, 78, 36, 49, 64, 81, 100]

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0.<br>
For example:
```
 +---+---+---+---+---+---+---+
 | s | q | u | a | r | e | s |
 +---+---+---+---+---+---+---+
 0   1   2   3   4   5   6   7
-7  -6  -5  -4  -3  -2  -1
```

#### List Constructor Function:

>The list() function can be used to construct a list from an iterable; this means that it can construct a list from `a Tuple, a Dictionary or a Set`. It can also construct a list from anything that implements the iterable protocol.

The signature of the **list() function** is <br>
```
list(iterable)
```
For example:

In [53]:
vowelTuple = ('a', 'e', 'i', 'o', 'u')
print(list(vowelTuple))

['a', 'e', 'i', 'o', 'u']


#### List Comprehensions
Real definition locate in Mathematics (set theory)

Example 1:

In [54]:
squares = []

for x in range(10):
    squares.append(x**2)
    
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [57]:
squares2 = list(map(lambda x: x**2, range(10)))
squares2

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [60]:
# use List Comprehension
squares3 = [x**2 for x in range(10)]
squares3

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Example 2:

In [61]:
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

In [64]:
combs = []
for x in [1,2,3]:
    for y in [3,1,4]:
        if x != y:
            combs.append((x, y))
combs

[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

Example 3:

In [17]:
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    ]


In [7]:
[[j[i] for j in matrix] for i in range(4)]

[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

In [68]:
[j[2] for j in matrix] # fomba fakana colonne anaty list

[3, 7, 11]

In [69]:
list(zip(*matrix))

[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

**Miscelleaneous**

In [30]:
cubes = [1, 8, 27, 65, 125]
4 ** 3  # the cube of 4 is 64, not 65!

64

In [32]:
cubes[3] = 64 # replace the wrong value
cubes

[1, 8, 27, 64, 125]

### Arrays

To understand arrays in Numpy, we need to answer the follow question:
- How create Arrays with Numpy?
- What the features of Numpy Arrays?
- Did Numpy Arrays support computation? (defend your ideas with examples)
- Which methodes could I used to accees data in Numpy arrays?


#### Creating Arrays with Numpy