# Clinical Trials

## Pudate, Title, Link
임상시험 날짜 : ```pubdate```  
임상시험 제목 : ```title```  
임상시험 주소 : ```link``` 

In [1]:
url = 'https://clinicaltrials.gov/ct2/results/rss.xml?rcv_d=14&lup_d=&sel_rss=new14&term=apatinib&count=10000'

In [2]:
import feedparser
feed = feedparser.parse(url)

In [3]:
print(feed.status)

200


In [4]:
for f in feed.entries:
    pubdate = f.published_parsed
    title = f.title
    link = f.link
    
    print(pubdate)
    print(title)
    print(link)

time.struct_time(tm_year=2023, tm_mon=1, tm_mday=6, tm_hour=17, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=6, tm_isdst=0)
Effect of Huaier Granule on Nephrotoxicity Associated With Targeted Therapy for Advanced Hepatocellular Carcinoma.
https://clinicaltrials.gov/ct2/show/NCT05673824?term=apatinib&sfpd_d=14&sel_rss=new14


## Update
임상시험 변경날짜: ```update```

In [5]:
import requests
from bs4 import BeautifulSoup

res = requests.get(link)
soup = BeautifulSoup(res.text, 'lxml')

In [6]:
print(res.status_code)

200


```update```는 \<span> 태그의 ```hit_org``` 클래스에 담겨 있습니다.

In [7]:
update = soup.find_all('span', {'class':'hit_org'})[0].text
print(update)

January 6, 2023


## Convert Datetime Format

임상시험 날짜는 ```pubdate```와 ```update```가 있습니다.  
- ```pubdate``` : 임상시험이 처음 등록된 시점의 날짜  
- ```update```&nbsp;&nbsp; : 임상시험이 업데이트 된 시점의 날짜  

임상시험이 업데이트 되면, ```pubdate```은 더 이상 중요하지 않고 ```update```가 더 중요합니다.  
따라서 ```pubdate```은 놔두고 ```update```를 전처리합니다. 

In [8]:
from datetime import datetime

날짜를 ```yyyy-mm-dd``` 형식으로 변환합니다.

In [9]:
update = datetime.strptime(update, '%B %d, %Y').date()
print(update)

2023-01-06


In [10]:
type(update)

datetime.date

날짜를 ```str``` 데이터타입으로 변환합니다.

In [11]:
update = update.strftime('%Y-%m-%d')
print(update)

2023-01-06


In [12]:
type(update)

str

## Sort

#### 1D 리스트

In [13]:
updates = ['2023-02-09','2023-01-08']
print(updates)

['2023-02-09', '2023-01-08']


```sort()```를 사용해 날짜 순으로 정렬합니다.

In [14]:
updates.sort()
print(updates)

['2023-01-08', '2023-02-09']


```reverse```를 사용해 최신 날짜 순으로 정렬합니다.

In [15]:
updates.sort(reverse=True)
print(updates)

['2023-02-09', '2023-01-08']


#### 2D 리스트

In [16]:
updates = [['2023-02-09','2023-01-08'],['2023-03-10','2023-04-07']]
print(updates)

[['2023-02-09', '2023-01-08'], ['2023-03-10', '2023-04-07']]


```lambda```를 사용해 필드 순으로 정렬합니다.

In [17]:
updates.sort(key=lambda row : (row[1], row[0]), reverse=True)
print(updates)

[['2023-03-10', '2023-04-07'], ['2023-02-09', '2023-01-08']]


## 전체 코드

첫번째 링크주소는 새롭게 등록된 임상시험이고,  
두번째 링크주소는 새롭게 변경된 임상시험이다.

In [18]:
from datetime import datetime
from bs4 import BeautifulSoup
import feedparser
import requests

urls = [
    # first_posted
    'https://clinicaltrials.gov/ct2/results/rss.xml?rcv_d=14&lup_d=&sel_rss=new14&term=apatinib&count=10000',
    # last_update_posted
    'https://clinicaltrials.gov/ct2/results/rss.xml?rcv_d=&lup_d=14&sel_rss=mod14&term=apatinib&count=10000'
]

def extract_update(link):
    
    res = requests.get(link)
    soup = BeautifulSoup(res.text, 'lxml')
    update = soup.find_all('span', {'class':'hit_org'})[0].text
    
    update = datetime.strptime(update, '%B %d, %Y').date()
    update = update.strftime('%Y-%m-%d')

    return update

def clinicaltrials():
    
    clinical_trials = []
    
    for url in urls:
        feed = feedparser.parse(url)
        
        for f in feed.entries:
            pubdate = f.published_parsed
            title = f.title
            link = f.link
            update = extract_update(link)
            
            clinical_trials.append([pubdate, update, title, link])
            
    clinical_trials.sort(key=lambda row : (row[1], row[0]), reverse=True)
            
    return clinical_trials

In [19]:
sources = clinicaltrials()

In [20]:
for pubdate, update, title, link in sources:
    print(update, title)

2023-01-13 Apatinib Combined With PD-1 in the Treatment of Recurrent or Metastatic Nasopharyngeal Carcinoma
2023-01-09 Effect of Huaier Granule on Nephrotoxicity Associated With Targeted Therapy for Advanced Hepatocellular Carcinoma.
2023-01-06 Effect of Huaier Granule on Nephrotoxicity Associated With Targeted Therapy for Advanced Hepatocellular Carcinoma.
2023-01-06 A Study of SHR-1210 in Combination With Capecitabine + Oxaliplatin or Apatinib in Treatment of Advanced Gastric Cancer
2023-01-05 A Study to Evaluate Camrelizumab Plus Rivoceranib (Apatinib) Versus Camrelizumab as Adjuvant Therapy in Patients With Hepatocellular Carcinoma (HCC) at High Risk of Recurrence After Curative Resection or Ablation
2023-01-04 SNF Platform Study of HR+/ HER2-advanced Breast Cancer
