# 1 datatime

如何在DataFRame中使用datetime包处理日期和时间数据

## 1.1 pd.to_datetime()：将字符串日期转为datetime类型

在DataFrame中，日期通常以字符串格式存储，需要先将字符串转换为datetime类型，才能进行时间运算和其他操作

pd.to_datetime(arg,format)

- arg:要转换为datetime的对象,比如Series
- format: 指定日期字符串的格式

In [None]:
!pip install pandas numpy

In [3]:
import pandas as pd

# 创建示例，日期用字符表示

data = {'Date':['2023-01-01','2023-01-02','2023-01-03'],'value':[100,200,300]}
df = pd.DataFrame(data)
df.Date

0    2023-01-01
1    2023-01-02
2    2023-01-03
Name: Date, dtype: object

In [6]:
# 将列转换为datetime类型
df['Date'] = pd.to_datetime(df ['Date'])
df

Unnamed: 0,Date,value
0,2023-01-01,100
1,2023-01-02,200
2,2023-01-03,300


 format指定日期字符串格式，例如国外的年月日顺序不一样的时候，说明排列顺序

- %Y:四位年份，
- %m:月份，01-12
- %d:日，01-31
- %H:小时 00-23
- %M:分钟 00-59
- %S:秒，00-59



In [8]:
pd.to_datetime("01-03-2024",format="%m-%d-%Y")

Timestamp('2024-01-03 00:00:00')

## 1.2 .dt提取日期和时间信息

转换为datetime类型后，可以提取年月日、时分秒信息

- dt.year
- dt.month
- dt.day
- dt.hour
- dt.,minute
- dt.second

In [9]:
df = pd.DataFrame(
    data={'Date':['02-01-2024','03-01-2024','04-01-2024'],'value':[122,233,344]}
)

In [10]:
df['Date'] = pd.to_datetime(df['Date'],format='%m-%d-%Y')
df

Unnamed: 0,Date,value
0,2024-02-01,122
1,2024-03-01,233
2,2024-04-01,344


In [14]:
# 提取信息

df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day

df['Hour'] = df['Date'].dt.hour
df['Minute'] = df['Date'].dt.minute
df['Second'] = df['Date'].dt.second
df

Unnamed: 0,Date,value,Year,Month,Day,Hour,Minute,Second
0,2024-02-01,122,2024,2,1,0,0,0
1,2024-03-01,233,2024,3,1,0,0,0
2,2024-04-01,344,2024,4,1,0,0,0


## 1.3 按日期过滤数据

使用布尔值或日期范围过滤

In [16]:
df[df['Month'] == 2]

Unnamed: 0,Date,value,Year,Month,Day,Hour,Minute,Second
0,2024-02-01,122,2024,2,1,0,0,0


In [24]:
# 创建时间戳对象
timestamp1 = pd.Timestamp('2024-03-01')
# 筛选
df[df['Date'] > timestamp1]

Unnamed: 0,Date,value,Year,Month,Day,Hour,Minute,Second
2,2024-04-01,344,2024,4,1,0,0,0


## 1.4 日期计算和时间差

### 1.4.1 计算时间差

In [28]:
df = pd.DataFrame(
    data={'start':['02-01-2024','03-01-2024','04-01-2024'],'end':['02-05-2024','03-08-2024','04-11-2024']}
)
df['start'] = pd.to_datetime(df['start'],format='%m-%d-%Y')
df['end'] = pd.to_datetime(df['end'],format='%m-%d-%Y')
df['during'] = df['end'] - df['start']
df

Unnamed: 0,start,end,during
0,2024-02-01,2024-02-05,4 days
1,2024-03-01,2024-03-08,7 days
2,2024-04-01,2024-04-11,10 days


### 1.4.2 pd.Timedelta()向日期加减时间

参数
- week:
- days:
- hours:
- minutes:
- second:

In [30]:
df['third_time'] = df['start'] + pd.Timedelta(weeks=1,hours=2)
df

Unnamed: 0,start,end,during,third_time
0,2024-02-01,2024-02-05,4 days,2024-02-08 02:00:00
1,2024-03-01,2024-03-08,7 days,2024-03-08 02:00:00
2,2024-04-01,2024-04-11,10 days,2024-04-08 02:00:00


## 1.5 按日期分组



In [33]:
df.groupby('start')['during'].mean()

start
2024-02-01    4 days
2024-03-01    7 days
2024-04-01   10 days
Name: during, dtype: timedelta64[ns]

## 1.6 设置和操作时间索引

pandas支持将日期设置为索引

In [34]:
df

Unnamed: 0,start,end,during,third_time
0,2024-02-01,2024-02-05,4 days,2024-02-08 02:00:00
1,2024-03-01,2024-03-08,7 days,2024-03-08 02:00:00
2,2024-04-01,2024-04-11,10 days,2024-04-08 02:00:00


In [35]:
# 设置时间索引
df.set_index('start',inplace=True)
df

Unnamed: 0_level_0,end,during,third_time
start,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-02-01,2024-02-05,4 days,2024-02-08 02:00:00
2024-03-01,2024-03-08,7 days,2024-03-08 02:00:00
2024-04-01,2024-04-11,10 days,2024-04-08 02:00:00


In [37]:
# 根据日期筛选
df.loc['2024-02-01']

end           2024-02-05 00:00:00
during            4 days 00:00:00
third_time    2024-02-08 02:00:00
Name: 2024-02-01 00:00:00, dtype: object

## 1.7 strptime和strftime

转换成datetime.datetime与str相互转换

### 1.7.1 格式化字符串

包括
- %Y:四位年份，
- %m:月份，01-12
- %d:日，01-31
- %H:小时 00-23
- %M:分钟 00-59
- %S:秒，00-59
- %I: 使用12小时制
- %p: 上午或下午，AM或PM

### 1.7.2 strptime

strptime将 时间字符串 转换为 datetime对象


In [40]:
import pandas as pd
from datetime import datetime
data = {
    'date_str':['2025-02-28 10:00:00','2024-09-04 13:00:07','2045-09-01 23:58:45']
}

df3 = pd.DataFrame(data)
df3

Unnamed: 0,date_str
0,2025-02-28 10:00:00
1,2024-09-04 13:00:07
2,2045-09-01 23:58:45


In [43]:
# 使用strptime将字符串转化为日期

df3['parse_date'] = df3.apply(lambda x:datetime.strptime(x['date_str'],'%Y-%m-%d %H:%M:%S'),axis=1)
df3

Unnamed: 0,date_str,parse_date
0,2025-02-28 10:00:00,2025-02-28 10:00:00
1,2024-09-04 13:00:07,2024-09-04 13:00:07
2,2045-09-01 23:58:45,2045-09-01 23:58:45



### 1.7.3 strftime
strftime将 datetime对象 转换为指定格式的时间字符串

In [47]:
df3['parse_str'] = df3['parse_date'].apply(lambda x:datetime.strftime(x,'%d/%m/%Y %I:%M %p'))
df3

Unnamed: 0,date_str,parse_date,parse_str
0,2025-02-28 10:00:00,2025-02-28 10:00:00,28/02/2025 10:00 AM
1,2024-09-04 13:00:07,2024-09-04 13:00:07,04/09/2024 01:00 PM
2,2045-09-01 23:58:45,2045-09-01 23:58:45,01/09/2045 11:58 PM


## 1.8 datetime.now()获取当前时间

In [50]:
from datetime import datetime
now = datetime.now()
now

datetime.datetime(2025, 10, 30, 0, 46, 23, 687630)

In [53]:
# 格式化当前时间

datetime.strftime(now,"%m/%d/%Y %I:%M:%S %p")

'10/30/2025 00:46:23 AM'

# 2 time

time包提高了获取当前时间，测量程序运行时间、暂停程序执行的功能，测量不同代码块执行时间

## 2.1 time.time()

返回当前时间的时间戳，单位为秒，是指从Unix纪元（1970-01-01 00：00：00 UTC）到现在的秒数

In [55]:
import time

timestamp = time.time()
print(timestamp)

1761756720.8152926


## 2.2 测量程序运行时间



In [56]:
start_time = time.time()

# 执行的代码

for i in range(10):
    _ = i ** 2

end_time = time.time()

# 运行时间,单位秒
print(end_time-start_time)

5.125999450683594e-05


## 2.3 time.sleep(seconds)程序暂停时间

## 2.4 time.localtime():当前时间，可以被time.strftime(format,time)格式化

In [57]:
time.localtime()

time.struct_time(tm_year=2025, tm_mon=10, tm_mday=30, tm_hour=0, tm_min=56, tm_sec=0, tm_wday=3, tm_yday=303, tm_isdst=0)

In [60]:
time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())

'2025-10-30 01:02:02'