Skip to content

mxm0312/libtglang

Repository files navigation

Telegram ML competition 2023

The task in this competition is to create a library that detects a programming or markup language of a code snippet.


My solution is inspired by this article about char-level CNN for programming language classification

Steps:

  • First of all I generated dataset with ~190k samples of labeled code snippets

The histplot below shows the amount of code snippets for each language (numbers on the X axis mathes with description in common.py file) FDCSL

  • Then I created a model and trained it on ~150k examples for 10 epochs
  • After the model has trained, I got 79% accuracy on the validation dataset (~37k samples) train
  • Finally, I created telegram bot to test my model in real life
photo_2022-09-26 00 12-2 photo_2022-09-26 00 12-4 photo_2022-09-26 00 12-4

About

Telegram ML competition 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published