## Pandas快速入门

Pandas 是一个用于数据处理和分析的 Python 库，它提供了两种主要的数据结构：Series（一维）和 DataFrame（二维）

### 1. 安装 Pandas

首先，确保已经安装了 Pandas。如果没有安装，可以使用以下命令进行安装：

In [1]:
!pip install pandas

Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
[0m

### 2. 导入Pandas

在Jupyter Notebook 中，使用以下语句导入 Pandas：

In [2]:
import pandas as pd

### 3. Pandas最常用的对象类型：DataFrame

#### 3.1 创建DataFrame

In [3]:
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,San Francisco
2,Charlie,35,Los Angeles


In [4]:
type(df)

pandas.core.frame.DataFrame

#### 3.2 Dataframe保存成csv文件

In [5]:
df.to_csv("test.csv", index_label = False)

#### 3.3 读取文件csv、excel格式文件，保存至dataframe格式

In [6]:
data = pd.read_csv("mater.csv")
data.head()

Unnamed: 0,pretty_formula,material_id,nsites,nelements,unit_cell_formula,e_above_hull,formation_energy_per_atom,energy,energy_per_atom,volume,...,symbol,point_group,band_gap,density,a,b,c,alpha,beta,gamma
0,WS,mp-1004526,4,2,"{'W': 2.0, 'S': 2.0}",0.436079,-0.505673,-34.919137,-8.729784,64.852032,...,,,0.0,11.056513,3.02551,3.02551,8.1808,90.0,90.0,120.000001
1,HfAu,mp-1007755,4,2,"{'Hf': 2.0, 'Au': 2.0}",0.0,-0.544234,-28.633094,-7.158273,78.612582,...,,,0.0,15.861591,3.507607,3.507607,6.38955,90.0,90.0,90.0
2,Ca2Pb,mp-1009756,3,2,"{'Ca': 2.0, 'Pb': 1.0}",0.038838,-0.503173,-9.268216,-3.089405,111.432037,...,,,0.6495,4.282124,5.401425,5.401425,5.401425,60.0,60.0,60.0
3,SbIr,mp-10125,4,2,"{'Sb': 2.0, 'Ir': 2.0}",0.086173,-0.192436,-26.752919,-6.68823,80.038446,...,,,0.0,13.028016,4.062594,4.062594,5.599652,90.0,90.0,120.0
4,Ba3AsP,mp-1013586,5,3,"{'Ba': 3.0, 'As': 1.0, 'P': 1.0}",0.346223,-0.681859,-19.250938,-3.850188,223.748507,...,,,0.0,3.843395,6.070904,6.070904,6.070904,90.0,90.0,90.0


In [7]:
type(data)

pandas.core.frame.DataFrame

另外，我们还可以使用read_excel函数读取excel文件。例如：


data = pd.read_excel('test.xlsx',sheet_name='nam') 

data.head()

#### 3.4 Pandas常用操作
- 查看数据
- 选择数据
- 缺失值处理
- 筛选数据
- 排序数据
- 数据分组
- 统计

In [8]:
# 查看前5行数据
data.head()

Unnamed: 0,pretty_formula,material_id,nsites,nelements,unit_cell_formula,e_above_hull,formation_energy_per_atom,energy,energy_per_atom,volume,...,symbol,point_group,band_gap,density,a,b,c,alpha,beta,gamma
0,WS,mp-1004526,4,2,"{'W': 2.0, 'S': 2.0}",0.436079,-0.505673,-34.919137,-8.729784,64.852032,...,,,0.0,11.056513,3.02551,3.02551,8.1808,90.0,90.0,120.000001
1,HfAu,mp-1007755,4,2,"{'Hf': 2.0, 'Au': 2.0}",0.0,-0.544234,-28.633094,-7.158273,78.612582,...,,,0.0,15.861591,3.507607,3.507607,6.38955,90.0,90.0,90.0
2,Ca2Pb,mp-1009756,3,2,"{'Ca': 2.0, 'Pb': 1.0}",0.038838,-0.503173,-9.268216,-3.089405,111.432037,...,,,0.6495,4.282124,5.401425,5.401425,5.401425,60.0,60.0,60.0
3,SbIr,mp-10125,4,2,"{'Sb': 2.0, 'Ir': 2.0}",0.086173,-0.192436,-26.752919,-6.68823,80.038446,...,,,0.0,13.028016,4.062594,4.062594,5.599652,90.0,90.0,120.0
4,Ba3AsP,mp-1013586,5,3,"{'Ba': 3.0, 'As': 1.0, 'P': 1.0}",0.346223,-0.681859,-19.250938,-3.850188,223.748507,...,,,0.0,3.843395,6.070904,6.070904,6.070904,90.0,90.0,90.0


In [9]:
# 查看最后5行数据
data.tail()

Unnamed: 0,pretty_formula,material_id,nsites,nelements,unit_cell_formula,e_above_hull,formation_energy_per_atom,energy,energy_per_atom,volume,...,symbol,point_group,band_gap,density,a,b,c,alpha,beta,gamma
133686,MnCdTe,mp-1232389,6,3,"{'Mn': 2.0, 'Cd': 2.0, 'Te': 2.0}",0.602764,0.294416,-24.666302,-4.11105,113.846005,...,,,0.0,8.604156,5.356025,5.356025,5.344017,103.399204,121.603159,104.236637
133687,PmGdMg2,mp-1232424,4,3,"{'Pm': 1.0, 'Gd': 1.0, 'Mg': 2.0}",0.011179,-0.091582,-22.417786,-5.604446,115.519576,...,,,0.0,5.043446,5.466678,5.466678,5.466678,60.0,60.0,60.0
133688,SbAsPCl13,mp-569209,32,4,"{'Sb': 2.0, 'As': 2.0, 'P': 2.0, 'Cl': 26.0}",0.0,-1.172916,-98.524704,-3.078897,1066.434115,...,,,2.1081,2.144258,9.840591,9.840591,12.179556,90.0,90.0,115.285453
133689,LiH2N3O,mp-859133,42,4,"{'Li': 6.0, 'H': 12.0, 'N': 18.0, 'O': 6.0}",0.062237,-0.828726,-253.67511,-6.039884,422.72974,...,,,3.6515,1.578553,9.314677,9.314695,5.625952,89.999999,90.000028,120.000069
133690,Li3Al2CrO6,mp-861573,12,4,"{'Li': 3.0, 'Al': 2.0, 'Cr': 1.0, 'O': 6.0}",0.013681,-2.850281,-80.446627,-6.703886,103.719773,...,,,3.777,3.566654,4.984852,4.984852,5.102318,80.540479,99.459521,60.01689


In [10]:
# 选择数据(单列)
data["material_id"]

0         mp-1004526
1         mp-1007755
2         mp-1009756
3           mp-10125
4         mp-1013586
             ...    
133686    mp-1232389
133687    mp-1232424
133688     mp-569209
133689     mp-859133
133690     mp-861573
Name: material_id, Length: 133691, dtype: object

In [11]:
# 选择数据(多列)
data[["material_id", "nelements"]]

Unnamed: 0,material_id,nelements
0,mp-1004526,2
1,mp-1007755,2
2,mp-1009756,2
3,mp-10125,2
4,mp-1013586,3
...,...,...
133686,mp-1232389,3
133687,mp-1232424,3
133688,mp-569209,4
133689,mp-859133,4


In [12]:
# 缺失值处理
data.info

<bound method DataFrame.info of        pretty_formula material_id  nsites  nelements  \
0                  WS  mp-1004526       4          2   
1                HfAu  mp-1007755       4          2   
2               Ca2Pb  mp-1009756       3          2   
3                SbIr    mp-10125       4          2   
4              Ba3AsP  mp-1013586       5          3   
...               ...         ...     ...        ...   
133686         MnCdTe  mp-1232389       6          3   
133687        PmGdMg2  mp-1232424       4          3   
133688      SbAsPCl13   mp-569209      32          4   
133689        LiH2N3O   mp-859133      42          4   
133690     Li3Al2CrO6   mp-861573      12          4   

                                   unit_cell_formula  e_above_hull  \
0                               {'W': 2.0, 'S': 2.0}      0.436079   
1                             {'Hf': 2.0, 'Au': 2.0}      0.000000   
2                             {'Ca': 2.0, 'Pb': 1.0}      0.038838   
3              