# Library

In [None]:
!pip install llama-stack

In [2]:
import pandas as pd
import numpy as np
import requests
import json

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.cluster import KMeans, DBSCAN
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.nn.functional as F
from datasets import load_dataset
from torch.utils.data import Dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments,
    # pipeline,
)

  from .autonotebook import tqdm as notebook_tqdm





# LLaMA

## Paper

LLaMA1: https://arxiv.org/pdf/2302.13971 <br>
LLaMA2: https://arxiv.org/pdf/2307.09288 <br>
LLaMA3: https://arxiv.org/pdf/2407.21783 <br>

### LLaMA1

#### Pre-training data

공개된 데이터셋만 사용하여 SOTA 모델이 학습가능함

<img src="https://mohitmayank.com/a_lazy_data_science_guide/imgs/nlp_llama_dataset.png" width="400" height="200">


#### Architecture

##### Pre-normalization

<img src="https://blogfiles.pstatic.net/MjAyNDA1MDdfMjQy/MDAxNzE1MDcyNTQ0NDEz.iFy-q1SUw5oojqAIClOW6zH0nt9tp4ODhJjNTsf-SK4g.n50Lkciuk5JJ1PVcpCTAQT7HZTZXVmnXakRibk9peUEg.PNG/image.png?type=w1" width="400" height="400">

LayerNorm -> RMSNorm <br>

<br>

||LayerNorm|RMSNorm|
|-|---------|-------|
|centering 제거| $(x - \mu) / \sqrt{(\sigma^2 + \epsilon)}$ | $x / \sqrt{\bar{x}^2 + \epsilon} $||
|분산| $ (\bar{x} - \mu)^2 $ | $ \bar{x}^2 $ |

<br>

<font style="font-size:16px"> 장점 </font>

1. 계산 효율성: $ \bar{x}^2 $는 $ \sigma^2 $보다 계산이 간단.
2. 스케일 불변성: 입력의 스케일에 관계없이 일정 출력 범위 유지.
3. 정규화: 입력을 적절한 범위로 조정하여 학습 안정성 향상.

<br>

<font style="font-size:16px"> 단점 </font>

- 약간의 표현력 제한. <br>

##### SwiGLU

Swish + GLU <br>

<br>

<font style="font-size:20px"> Swish </font>

<img src="https://production-media.paperswithcode.com/methods/Screen_Shot_2020-05-27_at_2.02.25_PM.png" width="400" height="300">

formula: $ f(x) = x * \sigma(\beta x) $, $\sigma$: sigmoid

<br>

<font style="font-size:16px"> 특징 </font>

1. $ x \geq 0 $ : unbounded
2. $ x \lt 0 $ : bounded
3. non-monotonic
4. smooth figure

<img src="https://velog.velcdn.com/images%2Fiissaacc%2Fpost%2F947a677f-6153-4c19-aeb4-4eb1d49c2ccf%2Fimage.png" width="600" height="200">

<br>
<br>

<font style="font-size:20px"> Gated Linear Units (GLU) </font>

<img src="https://miro.medium.com/v2/resize:fit:1350/format:webp/1*5EKhmH8ilnAbTuqiJ-FQ8w.png" width="400" height="300">

<br>

formula: GLU($x, W, V, b, c$) = $\sigma(xW + b) \otimes (xV + c)$ 

<br>

<font style="font-size:16px"> 특징 </font>

1. 정보 흐름 조절: sigmoid output의 결과를 곱하여 정보의 흐름 조절


##### Rotary Embeddings

각 차원에서 회전 변환을 사용하여 더 정교한 방법으로 위치 정보 인코딩.

## 사용 방법


### 발급 방법

**huggingface**
1. 사용 모델 권한 획득: https://huggingface.co/models
2. token 발급: https://huggingface.co/settings/tokens
3. command에서 huggingface-cli login 입력
4. 발급받은 token 입력

<br>

**meta**
1. access request: https://www.llama.com/
2. llama model list에서 모델 id 확인
3. llama download --source meta --model-id CHOSEN_MODEL_ID 실행

### 모델 종류

|LLaMA3.1|LLaMA3.2|
|----|--------|
||1B        |
||3B  |
|8B|11B (multi modal)|
|70B|90B (multi modal)|
|405B||

<br>

Llama-3.2-1B: base (pre-trained) <br>
Llama-3.2-1B-Instruct: fine-tuned (chat applications) <br>

### Prompt Engineering

refer: [prompt engineering](https://www.llama.com/docs/how-to-guides/prompting/#prompting)

<br>

<font style="font-size:18px"> Special Token </font>

| 토큰                      | 설명                                                                                   |
|--------------------------|--------------------------------------------------------------------------------------|
| <\|begin_of_text\|>     | 프롬프트의 시작 지정|
| <\|end_of_text\|>       | 모델이 더 이상 토큰을 생성하지 않음을 나타냄. 이 토큰은 기본 모델에 의해 생성|
| <\|finetune_right_pad_id\|> | 배치 내에서 동일한 길이로 패딩하는 데 사용|
| <\|start_header_id\|>   | <\|end_header_id\|> | 특정 메시지의 역할을 감싸는 토큰. 가능한 역할은 [system, user, assistant, ipython] |
| <\|eom_id\|>            | 메시지의 끝. 이는 모델이 도구 호출이 필요하다는 것을 알릴 수 있는 실행의 중지 지점을 나타냄. |
| <\|eot_id\|>            | 턴의 끝. 이는 모델이 사용자 메시지에 대한 응답을 완료했음을 결정했음을 나타냄. |
| <\|python_tag\|>        | 도구 호출을 나타내기 위해 모델의 응답에 사용되는 특별한 태그.                          |

<br>
<br>

<font style="font-size:18px"> Role </font>

| 역할      | 설명|
|----------|------------------------|
| system   | AI 모델과 상호작용할 맥락을 설정. 일반적으로 모델이 효과적으로 응답할 수 있도록 돕는 규칙, 지침 또는 필요한 정보 포함|
| user     | 모델과 상호작용하는 사람. 모델에 대한 입력, 명령 및 질문이 포함|
| ipython  | Llama 3.1에서 도입된 새로운 role. 의미적으로 "tool"을 의미하며, 실행자로부터 모델로 반환되는 도구 호출의 출력을 표시하는 데 사용 |
| assistant| 시스템, ipython 및 사용자 프롬프트에서 제공된 맥락에 기반하여 AI 모델이 생성한 응답|

#### Pretrained Model Prompt

> ```cmd
> <|begin_of_text|>{{ user_message}}
> ```

#### Instruct Model Prompt

> ```cmd
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
> 
> Cutting Knowledge Date: December 2023
> Today Date: 23 July 2024
> 
> You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
> 
> What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
> ```

#### Code Interpreter

> ```cmd
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
> 
> Environment: ipython<|eot_id|><|start_header_id|>user<|end_header_id|>
> 
> Write code to check if number is prime, use that to see if the number 7 is prime<|eot_id|><|start_header_id|>assistant<|end_header_id|>
> ```

#### User and assistant conversation

> ```cmd
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
> 
> Cutting Knowledge Date: December 2023
> Today Date: 23 July 2024
> 
> You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>
> 
> What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
> ```

### 사용 예제

> ```python
> model_name = <model_name>
> tokenizer = AutoTokenizer.from_pretrained(model_name)
> model = AutoModelForCausalLM.from_pretrained(model_name)
> 
> # base
> encoded = tokenizer('hello world', return_tensors='pt')
> 
> # instruct
> encoded = tokenizer.apply_chat_template(
>     conversation=[
>         {
>             'role': 'system',
>             'content': 'you are a useful robot',
>         },
>         {
>             'role': 'user',
>             'content': 'hello world',
>         },
>     ],
>     return_tensors='pt',
> )
> 
> # base
> model(**encoded)
> 
> # instruct
> model(encoded)
> ```

## Practice

### Huggingface

In [6]:
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B')

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


#### Load

##### LLama-3.2-1B

In [None]:
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B')

In [None]:
prompt = '''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

encoded = tokenizer.encode(prompt, return_tensors='pt')

In [None]:
tokenizer.decode(model.generate(encoded, max_length=64)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


'<|begin_of_text|>\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 23 July 2024\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nWhat is the capital of France?<|eot_id|>\nWhat is the capital of France?ें\n\nWhat is the capital of France?ें\n\nWhat is'

##### LLaMA-3.2-1B-Instruct

In [None]:
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B-Instruct')

In [None]:
encoded = tokenizer.apply_chat_template([
    {
        'role': 'system',
        'content': 'You are a helpful assistant',
    },
    {
        'role': 'user',
        'content': 'What is the capital of France?',
    }
], return_tensors='pt')

In [None]:
print(tokenizer.decode(model.generate(encoded, max_length=128)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 21 Oct 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of France is Paris.<|eot_id|>


In [None]:
prompt = '''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

encoded = tokenizer.encode(prompt, return_tensors='pt')
print(tokenizer.decode(model.generate(encoded, max_length=128)[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<|begin_of_text|>
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
The capital of France is Paris.<|eot_id|>


##### practice

investopia s&p500 내용 요약 <br>
gpt vs llama <br>

In [31]:
news = '''
What Is the S&P 500 Index?
The S&P 500 Index or Standard & Poor's 500 Index is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index includes 503 components because three have two share classes listed.

It's not an exact list of the top 500 U.S. companies by market cap because the index includes other criteria. The S&P 500 index is nonetheless regarded as one of the best gauges of prominent American equities' performance and the stock market overall.

Key Takeaways
The S&P 500 Index features 500 leading U.S. publicly traded companies with a primary emphasis on market capitalization.
The S&P 500 Index was launched in 1957 by the credit rating agency Standard and Poor's.
The S&P is a float-weighted index. The market capitalizations of the companies in the index are adjusted by the number of shares available for public trading.
The S&P 500 is considered one of the best gauges of large U.S. stocks and even the entire equities market because of its depth and diversity.
You can't invest directly in the S&P 500 because it's an index but you can invest in one of the many funds that use it as a benchmark and track its composition and performance.
S&P 500 Index
Investopedia / Julie Bang

Weighting Formula and Calculation of the S&P 500
The S&P 500 uses a market-cap weighting method that gives a higher percentage allocation to companies with the largest market capitalizations.
1

Company Weighting in S & P
=
Company market cap
Total of all market caps
Company Weighting in S & P= 
Total of all market caps
Company market cap
​
 
Determining the weighting of each component of the S&P 500 begins with calculating the total market cap for the index by adding together the market cap of every company in the index.

The market cap of a company is calculated by taking the current stock price and multiplying it by the company's outstanding shares. The total market cap for the S&P 500 as well as the market caps of individual companies are published frequently on financial websites, saving investors the need to calculate them.

The weighting of each company in the index is calculated by taking the company's market cap and dividing it by the total market cap of the index.
2


Other S&P Indices
The S&P 500 is a part of the S&P Global 1200 family of indices. Other indices include the S&P MidCap 400 which represents the mid-cap range of companies and the S&P SmallCap 600 which represents small-cap companies. The S&P 500, S&P MidCap 400, and S&P SmallCap 600 combine to cover 90% of all U.S. capitalization in an index known as the S&P Composite 1500.
3
4


S&P 500 Index Construction
The S&P uses only free-floating shares, the shares that the public can trade, when calculating market cap. The S&P adjusts each company's market cap to compensate for new share issues or company mergers.

The value of the index is calculated by totaling the adjusted market caps of each company and dividing the result by a divisor. The divisor is proprietary information of the S&P and isn't released to the public. The S&P Index (SPX) isn't a total return index and doesn't include cash dividend gains for the companies listed.
5

You can nonetheless calculate a company's weighting in the index and this can provide investors with valuable information. You can get a sense as to whether it might have an impact on the overall index if a stock rises or falls. A company with a 10% weighting would have a greater impact on the value of the index than a company with a 2% weighting.

The S&P 500 is one of the most widely quoted American indexes because it represents the largest publicly traded corporations in the U.S. It focuses on the U.S. market's large-cap sector and it's also a float-weighted index which is a type of capitalization weighting. Company market caps are adjusted by the number of shares available for public trading.
1

The S&P 500's most recent rebalancing was announced on March 1, 2024 and it took effect before the markets opened on March 18, 2024. Super Micro Computer and Deckers Outdoor replaced Whirlpool Corp. and Zions Bancorporation N.A. respectively at that time.
6

S&P 500 Competitors
S&P 500 vs. Dow Jones Industrial Average (DJIA)
Another common U.S. stock market benchmark is the Dow Jones Industrial Average (DJIA). The S&P 500 is often the institutional investor's preferred index given its depth and breadth. The DJIA has historically been associated with significant equities from the retail investor's point of view. Institutional investors perceive the S&P 500 as being more representative of U.S. equity markets because it includes more stocks across all sectors: 500 versus the Dow's 30.

The S&P 500 uses a market-cap weighting method that gives a higher percentage allocation to companies with the largest market caps. The DJIA is a price-weighted index that gives companies with higher stock prices a higher index weighting. The market-cap-weighted structure tends to be more common than the price-weighted index across U.S. indexes.
7

S&P 500 vs. Nasdaq
Nasdaq is a global electronic marketplace for trading securities. Several equity market indexes include stocks traded on Nasdaq. A given stock included in the S&P 500 Index may also be in one or more of the various Nasdaq indexes.

Some of the most-watched Nasdaq stock indices include:

Nasdaq 100 Index: Includes 100 of the largest, most actively traded common equities listed on Nasdaq
Nasdaq Composite Index: Often simply referred to as the Nasdaq by the media includes more than 2,500 common stocks that trade on Nasdaq
Nasdaq Global Equity Index (NQGI): Includes international stocks
PHLX Semiconductor Sector Index (SOX): The leading barometer of stocks related to the semiconductor industry
OMX Stockholm 30 Index (OMXS30): Includes 30 actively traded stocks on the Stockholm Stock Exchange
8

S&P 500 vs. Russell Indexes
The S&P 500 is a member of a set of indexes created by Standard & Poor's. This set of indexes is like the Russell index family in that both are market-cap-weighted unless stated otherwise as in the case of equal-weighted indexes.

There are two significant differences between the construction of the S&P and the Russell families of indexes. Standard & Poor's chooses constituent companies via a committee. Russell indexes use a formula to select which stocks to include. There's no name overlap within S&P style indices such as growth versus value. Russell indexes will include the same company in both the value and growth style indexes.
9
10

S&P 500 vs. Vanguard 500 Fund
The Vanguard 500 Index Fund aims to track the price and yield performance of the S&P 500 Index by investing its total net assets in the stocks that make up the index and by holding each component with approximately the same weight as the S&P index. The fund barely deviates from the S&P in this way, which it's designed to mimic.
11

The S&P 500 is an index so it can't be traded directly. Anyone who wants to invest in the companies that are included in the S&P must invest in a mutual fund or exchange-traded fund (ETF) that tracks the index such as the Vanguard 500 ETF (VOO).
Limitations of the S&P 500 Index
One of the limitations of the S&P and other market-cap-weighted indexes occurs when stocks in the index become overvalued. They rise higher than their fundamentals warrant. The stock typically inflates the overall value or price of the index if it has a heavy weighting in the index while being overvalued.

A company's rising market cap isn't necessarily indicative of its fundamentals. It simply reflects the stock's increase in value relative to the shares outstanding. Equal-weighted indexes have become increasingly popular as a result. Each company's stock price movements have an equal impact on these indexes.
12

Example of the S&P 500 Market Cap Weighting
The individual market weights must be calculated by dividing the market cap of each company by the total market cap of the index to understand how the underlying stocks affect the S&P index. Here's an example of Apple's weighting in the index:

Apple (AAPL) reported 15.7 billion shares outstanding in its quarterly filing for the period ending July 1, 2023 and it had a stock price of $173.93 at the end of the trading day on Sept. 21, 2023.
13
14
Apple's market cap was $2.7 trillion as of Sept. 21, 2023.
15
The S&P 500 total market cap was approximately $39.7 trillion as of Aug. 31, 2023. This is the sum of the market caps for all of the stocks in the index.
16
Apple's weighting in the index was approximately 6.8%, or $2.7 trillion divided by $39.7 trillion.
The larger the market weight of a company, the more impact each 1% change in a stock's price will have on the index. S&P doesn't provide the total list of all 503 components on its website, just the top 10.

Why Is It Called Standard and Poor's?
The first S&P Index was launched in 1923 as a joint project between the Standard Statistical Bureau and Poor's Publishing. The original index covered 233 companies. The two companies merged in 1941 to become Standard and Poor's.
17
7

What Companies Qualify for the S&P 500?
A company must be publicly traded and based in the United States to be included in the S&P 500 Index. It must also meet certain requirements for liquidity and market capitalization, have a public float of at least 10% of its shares, and have positive earnings over the trailing four quarters.
1

How Do You Invest in the S&P 500?
The simplest way to invest in the S&P 500 Index or any other stock market index is to buy shares of an index fund that targets it. These funds invest in a cross-section of the companies represented on the index so the fund's performance should mirror the performance of the index itself.

The Bottom Line
The S&P 500 Index is one of the most widely used indexes for the U.S. stock market. These 500 companies represent the largest and most liquid companies in the U.S. from technology and software companies to banks and manufacturers. The index has historically been used to provide insight into the direction of the stock market. It was created by a private company but the S&P 500 is a popular yardstick for the performance of the market economy at large.
'''

prompt = f'''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

{news}
위의 기사를 요약해줄래?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

inputs = tokenizer.encode(prompt, return_tensors='pt')
outputs = model.generate(inputs, max_length=4096)
print(tokenizer.decode(outputs[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<|begin_of_text|>
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>


What Is the S&P 500 Index?
The S&P 500 Index or Standard & Poor's 500 Index is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index includes 503 components because three have two share classes listed.

It's not an exact list of the top 500 U.S. companies by market cap because the index includes other criteria. The S&P 500 index is nonetheless regarded as one of the best gauges of prominent American equities' performance and the stock market overall.

Key Takeaways
The S&P 500 Index features 500 leading U.S. publicly traded companies with a primary emphasis on market capitalization.
The S&P 500 Index was launched in 1957 by the credit rating agency Standard and Poor's.
The S&P is a float-weighted index. The market capi

In [32]:
tokenizer = AutoTokenizer.from_pretrained('Bllossom/llama-3.2-Korean-Bllossom-3B')
model = AutoModelForCausalLM.from_pretrained('Bllossom/llama-3.2-Korean-Bllossom-3B')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading shards: 100%|██████████| 2/2 [09:55<00:00, 297.93s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:09<00:00,  4.86s/it]


In [33]:
news = '''
What Is the S&P 500 Index?
The S&P 500 Index or Standard & Poor's 500 Index is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index includes 503 components because three have two share classes listed.

It's not an exact list of the top 500 U.S. companies by market cap because the index includes other criteria. The S&P 500 index is nonetheless regarded as one of the best gauges of prominent American equities' performance and the stock market overall.

Key Takeaways
The S&P 500 Index features 500 leading U.S. publicly traded companies with a primary emphasis on market capitalization.
The S&P 500 Index was launched in 1957 by the credit rating agency Standard and Poor's.
The S&P is a float-weighted index. The market capitalizations of the companies in the index are adjusted by the number of shares available for public trading.
The S&P 500 is considered one of the best gauges of large U.S. stocks and even the entire equities market because of its depth and diversity.
You can't invest directly in the S&P 500 because it's an index but you can invest in one of the many funds that use it as a benchmark and track its composition and performance.
S&P 500 Index
Investopedia / Julie Bang

Weighting Formula and Calculation of the S&P 500
The S&P 500 uses a market-cap weighting method that gives a higher percentage allocation to companies with the largest market capitalizations.
1

Company Weighting in S & P
=
Company market cap
Total of all market caps
Company Weighting in S & P= 
Total of all market caps
Company market cap
​
 
Determining the weighting of each component of the S&P 500 begins with calculating the total market cap for the index by adding together the market cap of every company in the index.

The market cap of a company is calculated by taking the current stock price and multiplying it by the company's outstanding shares. The total market cap for the S&P 500 as well as the market caps of individual companies are published frequently on financial websites, saving investors the need to calculate them.

The weighting of each company in the index is calculated by taking the company's market cap and dividing it by the total market cap of the index.
2


Other S&P Indices
The S&P 500 is a part of the S&P Global 1200 family of indices. Other indices include the S&P MidCap 400 which represents the mid-cap range of companies and the S&P SmallCap 600 which represents small-cap companies. The S&P 500, S&P MidCap 400, and S&P SmallCap 600 combine to cover 90% of all U.S. capitalization in an index known as the S&P Composite 1500.
3
4


S&P 500 Index Construction
The S&P uses only free-floating shares, the shares that the public can trade, when calculating market cap. The S&P adjusts each company's market cap to compensate for new share issues or company mergers.

The value of the index is calculated by totaling the adjusted market caps of each company and dividing the result by a divisor. The divisor is proprietary information of the S&P and isn't released to the public. The S&P Index (SPX) isn't a total return index and doesn't include cash dividend gains for the companies listed.
5

You can nonetheless calculate a company's weighting in the index and this can provide investors with valuable information. You can get a sense as to whether it might have an impact on the overall index if a stock rises or falls. A company with a 10% weighting would have a greater impact on the value of the index than a company with a 2% weighting.

The S&P 500 is one of the most widely quoted American indexes because it represents the largest publicly traded corporations in the U.S. It focuses on the U.S. market's large-cap sector and it's also a float-weighted index which is a type of capitalization weighting. Company market caps are adjusted by the number of shares available for public trading.
1

The S&P 500's most recent rebalancing was announced on March 1, 2024 and it took effect before the markets opened on March 18, 2024. Super Micro Computer and Deckers Outdoor replaced Whirlpool Corp. and Zions Bancorporation N.A. respectively at that time.
6

S&P 500 Competitors
S&P 500 vs. Dow Jones Industrial Average (DJIA)
Another common U.S. stock market benchmark is the Dow Jones Industrial Average (DJIA). The S&P 500 is often the institutional investor's preferred index given its depth and breadth. The DJIA has historically been associated with significant equities from the retail investor's point of view. Institutional investors perceive the S&P 500 as being more representative of U.S. equity markets because it includes more stocks across all sectors: 500 versus the Dow's 30.

The S&P 500 uses a market-cap weighting method that gives a higher percentage allocation to companies with the largest market caps. The DJIA is a price-weighted index that gives companies with higher stock prices a higher index weighting. The market-cap-weighted structure tends to be more common than the price-weighted index across U.S. indexes.
7

S&P 500 vs. Nasdaq
Nasdaq is a global electronic marketplace for trading securities. Several equity market indexes include stocks traded on Nasdaq. A given stock included in the S&P 500 Index may also be in one or more of the various Nasdaq indexes.

Some of the most-watched Nasdaq stock indices include:

Nasdaq 100 Index: Includes 100 of the largest, most actively traded common equities listed on Nasdaq
Nasdaq Composite Index: Often simply referred to as the Nasdaq by the media includes more than 2,500 common stocks that trade on Nasdaq
Nasdaq Global Equity Index (NQGI): Includes international stocks
PHLX Semiconductor Sector Index (SOX): The leading barometer of stocks related to the semiconductor industry
OMX Stockholm 30 Index (OMXS30): Includes 30 actively traded stocks on the Stockholm Stock Exchange
8

S&P 500 vs. Russell Indexes
The S&P 500 is a member of a set of indexes created by Standard & Poor's. This set of indexes is like the Russell index family in that both are market-cap-weighted unless stated otherwise as in the case of equal-weighted indexes.

There are two significant differences between the construction of the S&P and the Russell families of indexes. Standard & Poor's chooses constituent companies via a committee. Russell indexes use a formula to select which stocks to include. There's no name overlap within S&P style indices such as growth versus value. Russell indexes will include the same company in both the value and growth style indexes.
9
10

S&P 500 vs. Vanguard 500 Fund
The Vanguard 500 Index Fund aims to track the price and yield performance of the S&P 500 Index by investing its total net assets in the stocks that make up the index and by holding each component with approximately the same weight as the S&P index. The fund barely deviates from the S&P in this way, which it's designed to mimic.
11

The S&P 500 is an index so it can't be traded directly. Anyone who wants to invest in the companies that are included in the S&P must invest in a mutual fund or exchange-traded fund (ETF) that tracks the index such as the Vanguard 500 ETF (VOO).
Limitations of the S&P 500 Index
One of the limitations of the S&P and other market-cap-weighted indexes occurs when stocks in the index become overvalued. They rise higher than their fundamentals warrant. The stock typically inflates the overall value or price of the index if it has a heavy weighting in the index while being overvalued.

A company's rising market cap isn't necessarily indicative of its fundamentals. It simply reflects the stock's increase in value relative to the shares outstanding. Equal-weighted indexes have become increasingly popular as a result. Each company's stock price movements have an equal impact on these indexes.
12

Example of the S&P 500 Market Cap Weighting
The individual market weights must be calculated by dividing the market cap of each company by the total market cap of the index to understand how the underlying stocks affect the S&P index. Here's an example of Apple's weighting in the index:

Apple (AAPL) reported 15.7 billion shares outstanding in its quarterly filing for the period ending July 1, 2023 and it had a stock price of $173.93 at the end of the trading day on Sept. 21, 2023.
13
14
Apple's market cap was $2.7 trillion as of Sept. 21, 2023.
15
The S&P 500 total market cap was approximately $39.7 trillion as of Aug. 31, 2023. This is the sum of the market caps for all of the stocks in the index.
16
Apple's weighting in the index was approximately 6.8%, or $2.7 trillion divided by $39.7 trillion.
The larger the market weight of a company, the more impact each 1% change in a stock's price will have on the index. S&P doesn't provide the total list of all 503 components on its website, just the top 10.

Why Is It Called Standard and Poor's?
The first S&P Index was launched in 1923 as a joint project between the Standard Statistical Bureau and Poor's Publishing. The original index covered 233 companies. The two companies merged in 1941 to become Standard and Poor's.
17
7

What Companies Qualify for the S&P 500?
A company must be publicly traded and based in the United States to be included in the S&P 500 Index. It must also meet certain requirements for liquidity and market capitalization, have a public float of at least 10% of its shares, and have positive earnings over the trailing four quarters.
1

How Do You Invest in the S&P 500?
The simplest way to invest in the S&P 500 Index or any other stock market index is to buy shares of an index fund that targets it. These funds invest in a cross-section of the companies represented on the index so the fund's performance should mirror the performance of the index itself.

The Bottom Line
The S&P 500 Index is one of the most widely used indexes for the U.S. stock market. These 500 companies represent the largest and most liquid companies in the U.S. from technology and software companies to banks and manufacturers. The index has historically been used to provide insight into the direction of the stock market. It was created by a private company but the S&P 500 is a popular yardstick for the performance of the market economy at large.
'''

prompt = f'''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

{news}
위의 기사를 요약해줄래?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

inputs = tokenizer.encode(prompt, return_tensors='pt')
outputs = model.generate(inputs, max_length=4096)
print(tokenizer.decode(outputs[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


<|begin_of_text|>
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>


What Is the S&P 500 Index?
The S&P 500 Index or Standard & Poor's 500 Index is a market-capitalization-weighted index of 500 leading publicly traded companies in the U.S. The index includes 503 components because three have two share classes listed.

It's not an exact list of the top 500 U.S. companies by market cap because the index includes other criteria. The S&P 500 index is nonetheless regarded as one of the best gauges of prominent American equities' performance and the stock market overall.

Key Takeaways
The S&P 500 Index features 500 leading U.S. publicly traded companies with a primary emphasis on market capitalization.
The S&P 500 Index was launched in 1957 by the credit rating agency Standard and Poor's.
The S&P is a float-weighted index. The market capi

#### Fine-tuning

##### ChatbotData

In [None]:
model_name = 'meta-llama/Llama-3.2-1B'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer.pad_token = "<|finetune_right_pad_id|>"
tokenizer.pad_token_id = 128004

In [None]:
data = pd.read_csv('./ChatbotData.csv')

In [None]:
question_prompt = '''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

answer_prompt = '''
{question_prompt}
{answer}<|eot_id|>
'''

In [None]:
data.Q = data.Q.apply(lambda x: question_prompt.format(user_prompt=x))
data.A = data.apply(lambda x: answer_prompt.format(question_prompt=x['Q'], answer=x['A']), axis=1)
data.A = data.A.str.replace('^\n', '', regex=True)

In [None]:
data['Q_tokenized'] = data.Q.apply(lambda x: tokenizer(x,
                                    max_length=256,
                                    padding='max_length',
                                    truncation=True,
                                    return_tensors='pt'),
                                  )
data['A_tokenized'] = data.A.apply(lambda x: tokenizer(x,
                                    max_length=256,
                                    padding='max_length',
                                    truncation=True,
                                    return_tensors='pt'),
                                  )

In [None]:
class QADataset(Dataset):
    def __init__(self, data: pd.DataFrame):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        temp = self.data.iloc[idx]

        return {
            'input_ids': temp.Q_tokenized.get('input_ids')[0],
            'attention_mask': temp.Q_tokenized.get('attention_mask')[0], 
            'labels': temp.A_tokenized.get('input_ids')[0],
        }

In [None]:
train, valid = train_test_split(data, test_size=0.2, random_state=0)
train_dataset = QADataset(train)
eval_dataset = QADataset(valid)

In [None]:
training_args = TrainingArguments(
    output_dir='./llama/',
    eval_strategy='epoch',
    learning_rate=1e-4,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

##### F 카페(7,859)

2000개 데이터 선택 <br>
데이터를 홀수와 짝수로 분류 <br>
&nbsp;&nbsp;&nbsp;&nbsp;짝수(고객), 홀수(점원) <br>
분리된 데이터를 다시 결합 (고객 질문 column과 직원 대답 column이 한 row에 오도록) <br>
tokenizer 적용 <br>
train, test split <br>
dataset 적용 <br>
training_args 정의 <br>
trainer 정의 <br>
학습 <br>

Streamlit에 연동

In [38]:
data = pd.read_excel('./data/F 카페(7,859)_new.xlsx')

In [52]:
data['joined'] = (
    data
    .loc[:, ['MQ', 'SQ', 'UA', 'SA']]
    .fillna('')
    .apply(lambda x: x['MQ'] + x['SQ'] + x['UA'] + x['SA'], axis=1)
)

customer = data.iloc[::2].filter(items=['joined']).rename(columns={'joined': 'Q'})
clerk = data.iloc[1::2].filter(items=['joined']).rename(columns={'joined': 'A'})

data = pd.concat([
    data.iloc[::2].filter(items=['joined']).rename(columns={'joined': 'Q'}).reset_index(drop=True),
    data.iloc[1::2].filter(items=['joined']).rename(columns={'joined': 'A'}).reset_index(drop=True)
    ],
    axis=1
).dropna()

data = data.iloc[:2000]

In [66]:
model_name = 'meta-llama/Llama-3.2-1B'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = '<|finetune_right_pad_id|>'
tokenizer.pad_token_id = 128004

In [67]:
question_prompt = '''
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
'''

answer_prompt = '''
{question_prompt}
{answer}<|eot_id|>
'''

In [None]:
data.Q = data.Q.apply(lambda x: question_prompt.format(user_prompt=x))
data.A = data.apply(lambda x: answer_prompt.format(question_prompt=x['Q'], answer=x['A']), axis=1)
data.A = data.A.str.replace('^\n', '', regex=True)

In [69]:
data['Q_tokenized'] = data.Q.apply(lambda x: tokenizer(x,
                                    max_length=128,
                                    padding='max_length',
                                    truncation=True,
                                    return_tensors='pt'),
                                  )
data['A_tokenized'] = data.A.apply(lambda x: tokenizer(x,
                                    max_length=128,
                                    padding='max_length',
                                    truncation=True,
                                    return_tensors='pt'),
                                  )

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Q_tokenized'] = data.Q.apply(lambda x: tokenizer(x,
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['A_tokenized'] = data.A.apply(lambda x: tokenizer(x,


In [70]:
class QADataset(Dataset):
    def __init__(self, data: pd.DataFrame):
        self.data = data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        temp = self.data.iloc[idx]

        return {
            'input_ids': temp.Q_tokenized.get('input_ids')[0],
            'attention_mask': temp.Q_tokenized.get('attention_mask')[0], 
            'labels': temp.A_tokenized.get('input_ids')[0],
        }

In [71]:
train, valid = train_test_split(data, test_size=0.2, random_state=0)
train_dataset = QADataset(train)
eval_dataset = QADataset(valid)

In [None]:
training_args = TrainingArguments(
    output_dir='./llama/',
    eval_strategy='epoch',
    learning_rate=1e-4,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=1,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

### Meta

In [None]:
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path)

In [None]:
tokenizer('hello world', return_tensors='pt')

# Ollama

site: https://ollama.com/

LLM 모델을 로컬 환경에서 쉽게 사용할 수 있도록 해주는 플랫폼. <br>
사용자는 대규모 언어 모델을 자신의 컴퓨터에 설치하고 실행할 수 있으며, 프라이버시와 속도를 개선할 수 있음. <br>
Ollama는 개발자들이 AI를 활용한 애플리케이션을 쉽게 만들 수 있도록 다양한 기능과 도구를 제공.
예를 들어, API를 통해 모델을 호출하거나, 사용자 정의 모델을 추가하는 등의 작업이 가능. <br>

## 사용 방법

> ```
> ollama run llama3.2
> ollama run llama3.2:1b
> ```

### API

#### generate

> ```python
> url = 'http://localhost:11434/api/generate'
> data = {
>     'model': 'llama3',
>     'prompt': '하늘은 왜 파란가요?'
> }
> 
> response = requests.post(url, json=data)
> ```

#### chat

> ```python
> url = 'http://localhost:11434/api/chat'
> data = {
>     'model': 'llama3.2',
>     'messages': [
>         {'role': 'user', 'content': '하늘은 왜 파란가요?'}
>     ]
> }
> 
> response = requests.post(url, json=data)
> ```

### Customizing

1\. 모델 다운로드

```cmd
ollama pull llama3.2
```

<br>

2\. 아래와 같이 Modelfile 생성

```cmd
FROM llama3.2

# 온도 설정 (높을수록 창의적, 낮을수록 정확함)  
PARAMETER temperature 1

# 시스템 메시지 설정
SYSTEM """
You are a kid. Answer like kid.
"""
```

<br>

3\. 아래의 커맨드를 입력하여 실행

```cmd
ollama create kid -f ./Modelfile
ollama run kid
```