* [Overview - OpenAI API](https://platform.openai.com/docs/overview)
* 모델 목록 : https://platform.openai.com/docs/models
* 플레이그라운드에서 미리 사용해 보기 : https://platform.openai.com/playground/chat?models=gpt-4o
* API 키 발급 : https://platform.openai.com/api-keys
* 과금 확인 : https://platform.openai.com/usage

## OpanAI API 활용 - ctrip 번역

### ENV - install library

In [None]:
#!pip install pandas
#!pip install openai    # OpenAI API
#!pip install openpyxl  # excel

### ENV - import

In [1]:
import pandas as pd
import sqlite3
import json
from openai import OpenAI
from tqdm import tqdm
tqdm.pandas()

In [2]:
df = pd.read_csv('data/OpenAI_API/ctrip_3000.csv')

In [3]:
df
# df["poiName"].to_dict()

Unnamed: 0,poiId,poiName,commentCount,commentScore,districtName,zoneName,distanceStr,coverImageUrl,openStatus,latitude,longitude,detailUrl
0,134029217,史努比乐园,315,4.7,济州市,城山日出峰/表善面,距市中心23.3km,https://dimg04.c-ctrip.com/images/1lo0j12000fm...,18:00闭园,33.444196,126.778305,https://you.ctrip.com/sight/jeju1446512/134029...
1,81803,牛岛,1379,4.7,济州市,城山日出峰/表善面,距市中心38.8km,https://dimg04.c-ctrip.com/images/100w1f000001...,,33.504298,126.954048,https://you.ctrip.com/sight/jeju1446512/25765....
2,10759495,乱打秀(济州剧场),633,4.7,济州市,东门市场/济州市政府,距市中心6.0km,https://dimg04.c-ctrip.com/images/0101e12000as...,16:30开园,33.445597,126.547558,https://you.ctrip.com/sight/jeju1446512/140813...
3,131170473,ARTE全沉浸式美术馆-韩国济州,355,4.8,济州市,涯月邑,距市中心21.1km,https://dimg04.c-ctrip.com/images/0100612000f6...,,33.396475,126.344614,https://you.ctrip.com/sight/jeju1446512/571893...
4,23909829,济州Ecoland生态主题乐园,400,4.5,济州市,咸德/朝天/旧左,距市中心13.2km,https://dimg04.c-ctrip.com/images/100f1f000001...,17:00闭园,33.455530,126.668187,https://you.ctrip.com/sight/jeju1446512/178942...
...,...,...,...,...,...,...,...,...,...,...,...,...
2995,136837021,5·18 현황조각 및 추모승화공간,0,0.0,光州,,,https://dimg04.c-ctrip.com/images/fd/tg/g2/M02...,,35.157393,126.857818,https://you.ctrip.com/sight/gwangju433/1368370...
2996,136837051,염색장 정관채 전수교육관,0,0.0,罗州市,,,https://dimg04.c-ctrip.com/images/fd/tg/g1/M08...,18:00闭园,34.997241,126.634259,https://you.ctrip.com/sight/najusi1595529/1368...
2997,136837167,구례관광특구,0,0.0,求礼郡,,,https://dimg04.c-ctrip.com/images/fd/tg/g1/M06...,,35.202857,127.526397,https://you.ctrip.com/sight/gurye120565/136837...
2998,136837224,Han Hee Won Art Museum,0,0.0,光州,,,https://dimg04.c-ctrip.com/images/fd/tg/g1/M06...,,35.141078,126.914141,https://you.ctrip.com/sight/gwangju433/1368372...


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 12 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   poiId          3000 non-null   int64  
 1   poiName        3000 non-null   object 
 2   commentCount   3000 non-null   int64  
 3   commentScore   3000 non-null   float64
 4   districtName   3000 non-null   object 
 5   zoneName       1326 non-null   object 
 6   distanceStr    1255 non-null   object 
 7   coverImageUrl  3000 non-null   object 
 8   openStatus     543 non-null    object 
 9   latitude       3000 non-null   float64
 10  longitude      3000 non-null   float64
 11  detailUrl      3000 non-null   object 
dtypes: float64(3), int64(2), object(7)
memory usage: 281.4+ KB


In [9]:
# 데이터베이스 설정
db_name = 'poi_database.db'
table_name = 'poi_data'
columns_to_translate = ['poiName', 'districtName', 'zoneName', 'distanceStr']

# SQLite 데이터베이스 연결 및 테이블 저장
conn = sqlite3.connect(db_name)
try:
    # 데이터프레임을 SQLite 테이블에 저장
    # if_exists 파라미터 : 'replace' - 이미 있으면 삭제 후 새로 생성, 'append' - 이미 있으면 데이터만 추가, fail - 이미 있으면 오류 발생
    df.to_sql(table_name, conn, if_exists='replace', index=False)
    print(f"'{table_name}' 테이블이 '{db_name}' 데이터베이스에 성공적으로 저장되었습니다.")
except Exception as e:
    print(f"DB 저장 에러 발생: {e}")
finally:
    conn.close()

'poi_data' 테이블이 'poi_database.db' 데이터베이스에 성공적으로 저장되었습니다.


In [13]:
# SQLite DB에서 dataframe으로 불러오기
conn = sqlite3.connect(db_name)
try:
    df_read = pd.read_sql_query("SELECT * FROM poi_data", conn)
    print(df_read)
except Exception as e:
    print(f"dataframe으로 불러오기 에러 발생: {e}")
finally:
    conn.close()

          poiId                 poiName  commentCount  commentScore  \
0     134029217                   史努比乐园           315           4.7   
1         81803                      牛岛          1379           4.7   
2      10759495               乱打秀(济州剧场)           633           4.7   
3     131170473        ARTE全沉浸式美术馆-韩国济州           355           4.8   
4      23909829         济州Ecoland生态主题乐园           400           4.5   
...         ...                     ...           ...           ...   
2995  136837021      5·18 현황조각 및 추모승화공간             0           0.0   
2996  136837051           염색장 정관채 전수교육관             0           0.0   
2997  136837167                  구례관광특구             0           0.0   
2998  136837224  Han Hee Won Art Museum             0           0.0   
2999  136837529         한국농어촌공사 새만금33센터             0           0.0   

     districtName    zoneName distanceStr  \
0             济州市   城山日出峰/表善面  距市中心23.3km   
1             济州市   城山日出峰/表善面  距市中心38.8km   
2           

### ENV - OpenAI 클라이언트 초기화 : colab

In [None]:
# from google.colab import userdata
# OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

# OpenAI 클라이언트 초기화
client = OpenAI()
# client = OpenAI(api_key=OPENAI_API_KEY)
client

### ENV - OpenAI 클라이언트 초기화 : jupyter notebook

In [25]:
OPENAI_API_KEY = 'MY_KEY'
# OpenAI 클라이언트 초기화
client = OpenAI(api_key=OPENAI_API_KEY)
client

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


<openai.OpenAI at 0x12fd67220>

### Func : 데이터베이스에 _ko 컬럼 추가

In [16]:
def add_translation_columns():
    conn = sqlite3.connect(db_name)
    cur = conn.cursor()
    
    # 테이블 정보 조회
    cur.execute(f"PRAGMA table_info({table_name})")
    existing_columns = [row[1] for row in cur.fetchall()]

    # 필요한 _ko 컬럼 추가
    for col in columns_to_translate:
        translated_col = f"{col}_ko"
        if translated_col not in existing_columns:
            alter_query = f"ALTER TABLE {table_name} ADD COLUMN {translated_col} TEXT"
            cur.execute(alter_query)
            print(f"{translated_col} 컬럼 추가 완료.")

    conn.commit()
    conn.close()

### Func : 번역. 1행씩 처리

In [17]:
def translate_and_update(row):
    try:
        # 번역할 텍스트를 json 형태로 구성 (NULL이 아닌 경우만)
        to_translate = {col: row[col] for col in columns_to_translate if pd.notna(row[col])}

        if not to_translate:
            return

        # API 요청
        completion = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "user",
                    "content": f"Translate the following Chinese place names to Korean. Return the result as JSON with each key suffixed by '_ko': {json.dumps(to_translate, ensure_ascii=False)}"
                }
            ],
            max_tokens=300,
            temperature=0.7
        )

        # 마크다운 제거 및 JSON 파싱
        response_text = completion.choices[0].message.content.strip()

        if response_text.startswith("```json"):
            response_text = response_text.replace("```json", "").replace("```", "").strip()

        # JSON 파싱
        response_json = json.loads(response_text)

        # 번역된 데이터베이스 업데이트
        update_query = f"""
        UPDATE {table_name}
        SET {', '.join([f"{key} = ?" for key in response_json.keys()])}
        WHERE poiId = ?
        """
        values = list(response_json.values()) + [row['poiId']]

        conn = sqlite3.connect(db_name)
        cur = conn.cursor()
        cur.execute(update_query, values)
        conn.commit()
        conn.close()

        print(f"poiId {row['poiId']} 업데이트 완료.")

    except Exception as e:
        print(f"Error at poiId {row['poiId']}: {e}")

### RUN : 데이터베이스에 필요한 컬럼 추가

In [18]:
add_translation_columns()

poiName_ko 컬럼 추가 완료.
districtName_ko 컬럼 추가 완료.
zoneName_ko 컬럼 추가 완료.
distanceStr_ko 컬럼 추가 완료.


### RUN : 데이터베이스 연결 후 번역이 필요한 데이터 불러오기

In [20]:
conn = sqlite3.connect(db_name)
query = f"""
SELECT * FROM {table_name}
WHERE zoneName_ko IS NULL OR distanceStr_ko IS NULL LIMIT 5
"""
df = pd.read_sql(query, conn)
conn.close()

# 번역 및 업데이트 수행
df.apply(lambda row: translate_and_update(row), axis=1)

print("모든 행 업데이트 완료.")

poiId 134029217 업데이트 완료.
poiId 81803 업데이트 완료.
poiId 10759495 업데이트 완료.
poiId 131170473 업데이트 완료.
poiId 23909829 업데이트 완료.
모든 행 업데이트 완료.


### Check : 번역된 데이터 확인

In [22]:
conn = sqlite3.connect(db_name)
query = f"""
SELECT * FROM {table_name}
WHERE zoneName_ko IS NOT NULL OR distanceStr_ko IS NOT NULL
"""
df_result = pd.read_sql(query, conn)
conn.close()

df_result

Unnamed: 0,poiId,poiName,commentCount,commentScore,districtName,zoneName,distanceStr,coverImageUrl,openStatus,latitude,longitude,detailUrl,poiName_ko,districtName_ko,zoneName_ko,distanceStr_ko
0,134029217,史努比乐园,315,4.7,济州市,城山日出峰/表善面,距市中心23.3km,https://dimg04.c-ctrip.com/images/1lo0j12000fm...,18:00闭园,33.444196,126.778305,https://you.ctrip.com/sight/jeju1446512/134029...,스누피랜드,제주 시,성산일출봉/표선면,시내 중심에서 23.3km
1,81803,牛岛,1379,4.7,济州市,城山日出峰/表善面,距市中心38.8km,https://dimg04.c-ctrip.com/images/100w1f000001...,,33.504298,126.954048,https://you.ctrip.com/sight/jeju1446512/25765....,우도,제주시,성산일출봉/표선면,시내 중심가에서 38.8km 거리
2,10759495,乱打秀(济州剧场),633,4.7,济州市,东门市场/济州市政府,距市中心6.0km,https://dimg04.c-ctrip.com/images/0101e12000as...,16:30开园,33.445597,126.547558,https://you.ctrip.com/sight/jeju1446512/140813...,난타쇼(제주극장),제주시,동문시장/제주시청,시내 중심에서 6.0km
3,131170473,ARTE全沉浸式美术馆-韩国济州,355,4.8,济州市,涯月邑,距市中心21.1km,https://dimg04.c-ctrip.com/images/0100612000f6...,,33.396475,126.344614,https://you.ctrip.com/sight/jeju1446512/571893...,ARTE 전immersive 미술관 - 한국 제주,제주시,아월읍,시내 중심에서 21.1km
4,23909829,济州Ecoland生态主题乐园,400,4.5,济州市,咸德/朝天/旧左,距市中心13.2km,https://dimg04.c-ctrip.com/images/100f1f000001...,17:00闭园,33.45553,126.668187,https://you.ctrip.com/sight/jeju1446512/178942...,제주 에코랜드 생태 테마파크,제주 시,함덕/조천/구좌,시 중심에서 13.2km


### Convert : df2csv, df2xlsx (Colab)

In [None]:
# prompt: 위 df 엑셀, csv 파일로 저장할 것 파일 이름은 ctrip_jeju 로 할 것

df_result.to_csv('ctrip_jeju.csv')
files.download('ctrip_jeju.csv')

df_result.to_excel('ctrip_jeju.xlsx')
files.download('ctrip_jeju.xlsx')

### Convert : df2csv, df2xlsx (jupyter notebook)

In [23]:
df_result.to_csv('ctrip_jeju.csv')

df_result.to_excel('ctrip_jeju.xlsx')

### Check : 한글 컬럼 확인

In [24]:
df_result['poiName_ko']

0                          스누피랜드
1                             우도
2                      난타쇼(제주극장)
3    ARTE 전immersive 미술관 - 한국 제주
4                제주 에코랜드 생태 테마파크
Name: poiName_ko, dtype: object