pytorch-labs / gpt-fast Public

Notifications You must be signed in to change notification settings
Fork 512
Star 5.6k

Code
Issues 66
Pull requests 42
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/gpt-fast

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

66 Open 38 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Has anyone run this code with bs>1 and speculatively?

#214 opened Oct 25, 2024 by deafTim

Mistake in 191 line if is_speculative=True generate.py ?

#213 opened Oct 23, 2024 by deafTim

Error with meta-llama/Llama-3.2-1B

#211 opened Oct 18, 2024 by deafTim

Request for Smaller Model Options (~1B Parameters)

#210 opened Oct 17, 2024 by deafTim

Error with stories15M and stories110M

#209 opened Oct 17, 2024 by deafTim

Reasons for the poor effect of Speculative Sampling

#198 opened Aug 26, 2024 by JoeNan1

Activation quantization support

#194 opened Aug 12, 2024 by ayyoobimani

trying to convert huggingface whisper model to pytorch

#189 opened Jul 8, 2024 by nullonesix

tokenizer.model

#186 opened Jun 27, 2024 by hasakikiki

It doesn't accelerate very well at L4

#185 opened Jun 25, 2024 by songh11

getting different acceptance prob when using torch.compile after making a small change.

#184 opened Jun 22, 2024 by kalradivyanshu

GGUF support?

#182 opened Jun 14, 2024 by yukiarimo

Missing Keys in state_dict

#172 opened May 6, 2024 by bjohn22

Tensor Parallel Inside notebook

#167 opened Apr 29, 2024 by nivibilla

mmap issue in bf16 of gpt-fast

#165 opened Apr 28, 2024 by yanbing-j

Naming: n_local_heads -> n_kv_heads

#162 opened Apr 23, 2024 by ad8e

int8 Woq raise Codegen Error with --compile_prefill

#144 opened Mar 22, 2024 by yanbing-j

Question about large sequence length attention kernels

#140 opened Mar 19, 2024 by loubbrad

CUDA error if enabling compile_prefill for quantization model (int8)

#137 opened Mar 14, 2024 by yanboliang

Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization

#127 opened Mar 8, 2024 by daniyal214

index out of range: No transformer config could be loaded

#126 opened Mar 8, 2024 by SinanAkkoyun

Question about the gennerated code of WeightOnlyInt8Linear

#114 opened Feb 29, 2024 by feiyuvl

batching/dynamic batching

#112 opened Feb 27, 2024 by nivibilla

Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation

#111 opened Feb 27, 2024 by duanzhaol

What happens to bias during int8 quantization?

#108 opened Feb 24, 2024 by gchhablani

Previous 1 2 3 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly