Skip to content

Commit

Permalink
updates for 345M model
Browse files Browse the repository at this point in the history
  • Loading branch information
WuTheFWasThat committed May 3, 2019
1 parent d14501a commit 0503b1b
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 5 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
__pycache__
.mypy_cache/
models/
1 change: 1 addition & 0 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ pip3 install -r requirements.txt
Download the model data
```
python3 download_model.py 117M
python3 download_model.py 345M
```

## Docker Installation
Expand Down
1 change: 1 addition & 0 deletions Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ WORKDIR /gpt-2
ADD . /gpt-2
RUN pip3 install -r requirements.txt
RUN python3 download_model.py 117M
RUN python3 download_model.py 345M
1 change: 1 addition & 0 deletions Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ WORKDIR /gpt-2
ADD . /gpt-2
RUN pip3 install -r requirements.txt
RUN python3 download_model.py 117M
RUN python3 download_model.py 345M
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@

Code and samples from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).

For now, we have only released a smaller (117M parameter) version of GPT-2.
We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2.

See more details in our [blog post](https://blog.openai.com/better-language-models/).

## Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2-117M. While GPT-2-117M is less proficient than GPT-2-1.5B, it is useful for a wide range of research and applications which could also apply to larger models.
This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

### Some caveats

- GPT-2-117M robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2-117M for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
- The dataset our GPT-2-117M was trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2-117M is likely to be biased and inaccurate as well.
- GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
- To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

### Work with us

Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2-117M! We’re especially interested in hearing from and potentially working with those who are studying
Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying
- Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
- The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Expand Down

2 comments on commit 0503b1b

@johndpope
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn’t seem like the underlying architecture changes?
I would have thought more neurons or something. If so, where are these changes?

@WuTheFWasThat
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more neurons, you can see the parameters when you python3 download_model.py 345M

Please sign in to comment.