# Introduction to Pandas, Series and DataFrame

## 1. Data Analysis
- Data Analysis and Pandas  数据分析与 Pandas
    - Pandas is a tool in Python that helps you collect (or read) data from a file, organize it in a tabular format, manipulate and clean it, if required, to derive insightful information from it.  Pandas 是一个 Python 工具，它帮助你从文件中收集（或读取）数据，以表格形式组织它，并在需要时进行操作和清理，以便从中得出有见地的信息。

## 2. Introduction to Pandas  Pandas 简介
- Officially stands for Python Data Analysis Library.  官方全称为 Python 数据分析库。
- It is a tool used by data scientists to:  它是数据科学家用来：
    - read,  读取，
    - write,  写入，
    - manipulate, and  操作，
    - analyze the data.  分析数据。

In [6]:
# import pandas as pd
import pandas as pd
import numpy as np

## 3. Pandas Objects  Pandas 对象
- Before we dive into series, let’s do a quick recap of pandas ‘objects’. At the core of the pandas library, there are two fundamental data structures/objects:
在深入 Series 之前，让我们快速回顾一下 pandas 的“对象”。在 pandas 库的核心，有两种基本的数据结构/对象：
    - Series
    - Data Frames  数据框

### a. Series
- What is a Series?  什么是 Series？
    - A one-dimensional labeled array  一个一维的带标签数组
    - Can hold data of any type  可以容纳任何类型的数据
    - Is like a column in a table  就像表格中的一列
- What can a Series have?  Series 可以包含什么？
    - A Series can have all the elements as numbers in it:  Series 可以包含其中的所有元素作为数字： dtype=int64
    - A Series can have all the elements as strings in it:  一个 Series 可以包含所有元素都是字符串： dtype=object
    - A Series can have its elements as both numbers and strings.  一个 Series 的元素可以是数字和字符串的组合。 dtype=object
    - Series is like a list in Python that can take any type of value like integers, strings, floats (or decimal values), etc. Series 类似于 Python 中的列表，可以接受任何类型的值，如整数、字符串、浮点数（或小数值）等。
    - All the items in the series are labeled with indexes:  系列中的所有项目都使用索引进行标记：beginning from 0, 从 0 开始，

In [4]:
list1 = [1, 2, 3, 4, 5]
pd.Series(list1)

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [5]:
list2 = {'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}
pd.Series(list2)

one      1
two      2
three    3
four     4
five     5
dtype: int64

### b. Data Frames
- What is a DataFrame?  什么是 DataFrame？
    - Two-dimensional table  二维表格
    - Made up of a collection of Series  由一系列组成
    - Structured with labeled axes (rows and columns)  带有标记轴（行和列）的结构

In [8]:
# You can create a DataFrame using a Python list or a NumPy array:  你可以使用 Python 列表或 NumPy 数组来创建一个 DataFrame：
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
df1 = pd.DataFrame(data)
df1

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


In [11]:
array1 = np.array([[1000, 'A', 1.5], [2000, 'B', 2.5], [3000, 'C', 3.5]])
df2 = pd.DataFrame(array1, columns=['ID', 'Name', 'Value'], index=['1', '2', '3'])
df2

Unnamed: 0,ID,Name,Value
1,1000,A,1.5
2,2000,B,2.5
3,3000,C,3.5


In [12]:
data1 = {
    'ID': [1000, 2000, 3000],
    'Name': ['A', 'B', 'C'],
    'Value': [1.5, 2.5, 3.5]
}
df3 = pd.DataFrame(data1)
df3

Unnamed: 0,ID,Name,Value
0,1000,A,1.5
1,2000,B,2.5
2,3000,C,3.5


### c. Summary
- A Column is a Series  一个列是一个系列
- A DataFrame is a collection of series.  一个 DataFrame 是一个系列的集合。
- A series is a column in a table or a DataFrame.  一个系列是一个表或 DataFrame 中的列。