Skip to content
Avatar
🤗
hacking 🎧 , UTC +2
🤗
hacking 🎧 , UTC +2

Highlights

  • Pro

Organizations

@dbmdz @flairNLP @Hugging-Face-Supporter @GermanT5
Block or Report

Block or report stefan-it

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
stefan-it/README.md

🤗

👋 Hi there

I'm working at Bavarian State Library 📚 now and please visit, watch and star our pre-trained language models repo!

I'm currently working on the awesome Flair library and love contributing to 🤗 Transformers.

📰 Latest news

Latest news of new language models, PRs and many more!

  • 09.06.2022: Preprint of our upcoming HIPE-2022 Working Notes paper is now available here: hmBERT: Historical Multilingual Language Models for Named Entity Recognition.

  • 20.02.2022: Check out our new GermanT5 organization - expect new T5 models for German soon!

  • 14.12.2021: New badge: Member of Hugging Face Supporter org now 🎉

  • 13.12.2021: Release of Historic Language Model for Dutch (trained on Delpher corpus) - see repo here.

  • 06.12.2021: Release of smaller multilingual Historic Language Models (ranging from 2-8 layers) - see repo here.

  • 18.11.2021: Release of new multilingual and monolingual Historic Language Models - as preparation for upcoming CLEF-HIPE 2022 - see repo here.

  • 23.09.2021: Release of ConvBERTurk (cased and uncased) and ELECTRA (uncased) trained on Turkish part of mC4 corpus - see repo here.

  • 07.09.2021: Release of new larger German GPT-2 model - see model hub card here.

  • 17.08.2021: Release of new re-trained German GPT-2 model - see repo here.

  • 05.07.2021: Preprint of the ICDAR 2021 paper "Data Centric Domain Adaptation for Historical Text with OCR Errors" together with Luisa März, Nina Poerner, Benjamin Roth and Hinrich Schütze is out now!

  • 24.06.2021: Turkish Language Model Zoo repo got a new logo from Merve Noyan, please follow her! Additionally, a new Turkish ELECTRA model was released, that was trained on the Turkish part of multilingual C4 dataset. More details here.

  • 03.05.2021: GC4LM: A Colossal (Biased) language model for German was released. Repo with more details here.

  • 27.04.2021: Our paper "Data Centric Domain Adaptation for Historical Text with OCR Errors" was accepted at ICDAR 2021. More details soon!

  • 16.03.2021: Turkish model zoo is still growing! Public release of ConvBERTurk - see repo here.

  • 07.02.2021: Public release of German Europeana DistilBERT and ConvBERT models. Repo with more information is here.

  • 28.01.2021: Expect a new German Europeana ELECTRA Large model incl. a distilled German Europeana BERT model soon 🤗

  • 16.11.2020: Public release of French Europeana BERT and ELECTRA models - see repository here.

  • 16.11:2020: Public release of a German GPT-2 model (incl. fine-tuned model on Faust I and II). Repo with more information is available here.

  • 11.11.2020: Public release of Ukrainian ELECTRA model. Repo is now available here.

  • 11.11.2020: New workstation build (RTX 3090 and Ryzen 9 5900X) has completed! Expect a lot of new Flair/Transformers models in near future!

  • 02.11.2020: Public release of Italian XXL ELECTRA model. New repo for Italian BERT and ELECTRA models is now available here 🎉

  • 22.10.2020: Preprint of "German's Next Language Model" is now available here. Models are also available on the Hugging Face model hub 🎉

  • 22.10.2020: Our shared task paper Triple E - Effective Ensembling of Embeddings and Language Models for NER of Historical German together with Luisa März is released 🎉

  • 30.09.2020: "German's Next Language Model" together with Branden Chan and Timo Möller was accepted at COLING 2020! Expect new language models for German on the Hugging Face model hub soon 🤗

  • 23.09.2020: Flair in version 0.6.1 is out now!

  • 02.09.2020: Slow response time - I'm currently focussing on EACL 2021. Expect great new things 😎

  • 18.08.2020: French BERT model, trained on Historic newspapers from Europeana: find the model here and the corresponding repository here.

📃 Publications

📃 Preprints

💬 Contact

Please open an issue in the corresponding repository or tag me (@stefan-it) in issues/prs/commits on GitHub :)

You can also find me on the 🤗 Discussion forum.

Pinned

  1. Turkish BERT/DistilBERT, ELECTRA and ConvBERT models

    Python 333 27

  2. dbmdz/berts Public

    DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models

    128 11

  3. Experiments with Zalando's flair library

    Python 35 5

  4. Language Models for Zalando's flair library

    54 4

  5. Repository for "Towards Robust Named Entity Recognition for Historic German"

    Python 14 3

  6. General-Purpose Neural Networks for Sentence Boundary Detection

    Python 67 7

505 contributions in the last year

Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Mon Wed Fri

Contribution activity

August 2022

Created 1 repository

Created a pull request in flairNLP/flair that received 3 comments

embeddings: add support for T5 encoder models

Hi, this PR adds support for encoder-only fine-tuning T5 models. Supported models are T5, mT5 and LongT5. Unfortunately, ByT5 is currently not work…

+17 −2 3 comments
Opened 1 other pull request in 1 repository
bigscience-workshop/lam 1 merged
Reviewed 2 pull requests in 2 repositories
flairNLP/flair 1 pull request
dbmdz/solr-ocrhighlighting 1 pull request

Seeing something unexpected? Take a look at the GitHub profile guide.