Skip to content

design and implement scribe server v1 API#29

Merged
andrewtavis merged 14 commits into
scribe-org:mainfrom
DeleMike:refactor/http-to-gin
Jul 13, 2025
Merged

design and implement scribe server v1 API#29
andrewtavis merged 14 commits into
scribe-org:mainfrom
DeleMike:refactor/http-to-gin

Conversation

@DeleMike
Copy link
Copy Markdown
Collaborator

@DeleMike DeleMike commented Jul 2, 2025

Contributor checklist


Description

This PR refactors the HTTP server from net/http to the Gin web framework, improving routing, middleware support, and extensibility. It also introduces a RESTful API design for handling language data.

Changes

  • Replaced net/http with Gin throughout the app
  • Renamed api/handler.go to api/handlers.go for consistency
  • Implemented language data API endpoints with CORS support
  • Added basic unit tests for handlers

Motivation

Gin provides a cleaner syntax for routing and middleware, along with better performance. This migration improves the codebase structure and sets a strong foundation for future API development.

Related issue

DeleMike added 4 commits July 2, 2025 03:13
- Add versioned API endpoints for language data retrieval
- Implement GET /v1/data/:lang for full language datasets
- Implement GET /v1/data-version/:lang for version metadata
- Add CORS middleware optimized for GET-only requests
- Include proper error handling with structured responses
- Add mock data support for English (en) language
@DeleMike DeleMike requested a review from axif0 July 2, 2025 03:07
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 2, 2025

Thank you for the pull request! ❤️

The Scribe-Server team will do our best to address your contribution as soon as we can. If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider attending our bi-weekly Saturday dev syncs. It'd be great to meet you 😊

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 2, 2025

Maintainer Checklist

The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First PR Commit Check

  • The commit messages for the remote branch should be checked to make sure the contributor's email is set up correctly so that they receive credit for their contribution
  • The contributor's name and icon in remote commits should be the same as what appears in the PR
  • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Server repo (can be set with git config --global user.email "GITHUB_EMAIL")

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 2, 2025

Hi @andrewtavis and @axif0

I have migrated the project net/http to use Gin to provide a cleaner syntax for routing and middleware, along with better performance.

I have set up the routes we will need:

  1. /data/:lang : Returns all available language data and schema contract.
  2. /data-version/:lang : Returns last modified dates per data type for a given language.

I have added basic test cases to the project and the test code coverage is 58.3%. That is 58.3% of our code lines are tested. We can get this higher...
preview

next steps now is to use connect the handlers to read the mariadb content for real data.

What do you two think?

@DeleMike DeleMike changed the title Refactor/http to gin design and implement scribe server v1 API Jul 2, 2025
@DeleMike DeleMike requested a review from andrewtavis July 2, 2025 03:38
@andrewtavis
Copy link
Copy Markdown
Member

Thanks so much for the PR, @DeleMike! 🎉 So amazing that the API is starting to take shape 😊

I'm realizing that we need to update the maintainer checklist here with a checkbox for making sure the CI passes. @DeleMike, could we also add the coverage report to the pr_ci workflow?

@andrewtavis
Copy link
Copy Markdown
Member

That way we can easily track the progress of testing in PRs as we do in other projects :)

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 2, 2025

Thanks for the review @andrewtavis
I will look into that before moving forward.

Will that be a separate PR? Or I add it to this?

@andrewtavis
Copy link
Copy Markdown
Member

I think we can just add it here, @DeleMike, but if you'd like to make an issue for it as well and link that one to here that would be totally fine! 😊

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 2, 2025

Okay, that's fine. I will decide which one is better asap!

@axif0
Copy link
Copy Markdown
Member

axif0 commented Jul 3, 2025

One think I like to mention that, We need to sanitize isos and data types

  • Make assertions that they are allowed values before they are put into the string interpoloation to prevent SQL injection
  • Checks need to be on getting data from MariaDB

Comment thread api/handlers_test.go Outdated
@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 3, 2025

One think I like to mention that, We need to sanitize isos and data types

  • Make assertions that they are allowed values before they are put into the string interpoloation to prevent SQL injection
  • Checks need to be on getting data from MariaDB

Yeah, great point! And thanks!

Could you point me to an example? I don't fully understand.

@axif0
Copy link
Copy Markdown
Member

axif0 commented Jul 3, 2025

Could you point me to an example? I don't fully understand.

We should need a checker function like isValidLanguageCode that validates user request

func isValidLanguageCode(lang string) bool {
    allowedLanguages := []string{"en", "es", "fr", etc....}
    if len(lang) != 2 {
        return false
    }
    for _, allowed := range allowedLanguages {
        if lang == allowed {
            return true
        }
    }
    return false
}

func getLanguageData(c *gin.Context) {
    lang := c.Param("lang")
    if !isValidLanguageCode(lang) {  
        c.JSON(400, gin.H{"error": "Invalid language code"})
        return
    }
    query := "SELECT * FROM language_data WHERE iso_code = ?"
    result := db.Query(query, lang)  // Parameterized - safe!
    
}

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 3, 2025

Could you point me to an example? I don't fully understand.

We should need a checker function like isValidLanguageCode that validates user request

func isValidLanguageCode(lang string) bool {
    allowedLanguages := []string{"en", "es", "fr", etc....}
    if len(lang) != 2 {
        return false
    }
    for _, allowed := range allowedLanguages {
        if lang == allowed {
            return true
        }
    }
    return false
}

func getLanguageData(c *gin.Context) {
    lang := c.Param("lang")
    if !isValidLanguageCode(lang) {  
        c.JSON(400, gin.H{"error": "Invalid language code"})
        return
    }
    query := "SELECT * FROM language_data WHERE iso_code = ?"
    result := db.Query(query, lang)  // Parameterized - safe!
    
}

Oh! Of course, I will add these!
I was constructing the ground works first then move on to more concrete things.

Thanks!!

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 5, 2025

Hi @axif0 ,
can you suggest how we can get values for allowedLanguages?

func isValidLanguageCode(lang string) bool {
    allowedLanguages := []string{"en", "es", "fr", etc....}
    if len(lang) != 2 {
        return false
    }
    for _, allowed := range allowedLanguages {
        if lang == allowed {
            return true
        }
    }
    return false
}

allowedLanguages := []string{"en", "es", "fr", etc....} seems manual. I want to improve it.

is this a better approach?

Let say a user pings api.scribe-server/data/xy, where xy is a language code. The way our code is being constructed at this moment, we will search for data in XY_LanguageData Table and if it does not exist we will obviously return to the user that this language does not exist in Scribe-Server.

The drawback of this is that we will hit the DB and then have no data. Seems like a waste of time.

But if we define a list or some other advanced data structure like a map(this is faster), then we can do something like this:

var iso639_1Codes = map[string]bool{"en": true, "fr": true, "de": true}

Then quickly check if the key exists or not. but here we still have to define or append to the list later what language types is supported.

I'm trying to remove the process of writing allowed languages. or is inevitable?

At the end of the day, we will still check if that table still exists or not, because we can still define it there in the map but the table does not actually exist in the DB.

what do you think?

I hope this makes sense 🥲

cc: @andrewtavis

@axif0
Copy link
Copy Markdown
Member

axif0 commented Jul 5, 2025

var iso639_1Codes = map[string]bool{"en": true, "fr": true, "de": true}

I think for now we can hard-code the list of available languages. But later release, we'll need to dynamically fetch the available languages , ideally filtering out the ones that have no data. And needs SQL injection checker function.

@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 6, 2025

Alright. thanks, @axif0! I will improve my work and then try to see if I can lay up the ground work for the SQL checker.

You should expect a PR update sometime from Monday! 👌🏾

DeleMike added 3 commits July 7, 2025 00:25
- Added `/api/v1/data/:lang` to serve language-specific structured data
- Added `/api/v1/data-version/:lang` for last modified tracking of data types
- Added `/api/v1/languages` to list supported languages and their data types
- Refactored error messages into constants for reuse and clarity
- Defined contract version and date formatting in internal/constants

These endpoints form the foundation of the Scribe API contract and support
client-side syncing, schema introspection, and multi-language support.
…int tests

Temporarily removed detailed tests for `/data/:lang` and `/data-version/:lang`
to unblock CI and stabilize test coverage. Added minimal tests for root
hello endpoint and CORS header presence. Full endpoint tests will be
restructured and reintroduced later.
@DeleMike
Copy link
Copy Markdown
Collaborator Author

DeleMike commented Jul 7, 2025

Hi @axif0 and @andrewtavis , I have made an update for our API. You can access the draft plan here.. It's more like a guide for this PR and also your previous comments.

Now, I have implemented api/v1/data/:lang, api/v1/data-versions/:lang and api/v1/languages endpoints.

  • The api/v1/data/:lang endpoint returns the full linguistic data for a specific language.
  • Each language has its own contract structure defining the available fields for different word types.
  • The api/v1/data-versions/:lang endpoint provides last modification timestamps for tracking updates.
  • The api/v1/languages endpoint lists all available languages and their supported data types.
  • Field values can be null when not applicable for a particular word or language.

See example response after I dragged and dropped ENLanguageData.sqlite and FRLanguageData.sqlite into the server and the ran make migrate

Endpoint: api/v1/data/en

Full Response Structure

  "language": "en",
  "contract": {
    "version": "1.0.0",
    "updated_at": "2025-07-07",
    "fields": {
      "adjectives": {
        "comparative": "text",
        "lastModified": "timestamp",
        "pastparticiple": "text",
        "plural": "text",
        "positive": "text",
        "presentparticiple": "text",
        "simplepast": "text",
        "simplepresent": "text",
        "simplepresentThirdPersonSingular": "text",
        "singular": "text",
        "superlative": "text",
        "wdLexemeId": "text"
      },
      "adverbs": {
        "comparative": "text",
        "contraction": "text",
        "lastModified": "timestamp",
        "positive": "text",
        "superlative": "text",
        "wdLexemeId": "text"
      },
      "nouns": {
        "femininePlural": "text",
        "feminineSingular": "text",
        "genitive": "text",
        "genitivePlural": "text",
        "genitiveSingular": "text",
        "lastModified": "timestamp",
        "masculinePlural": "text",
        "masculineSingular": "text",
        "nominative": "text",
        "nominativePlural": "text",
        "nominativeSingular": "text",
        "plural": "text",
        "pluralSingular": "text",
        "singular": "text",
        "wdLexemeId": "text"
      }
    }
  },
  "data": {
    "adjectives": [
      {
        "comparative": "weirder",
        "lastModified": "2024-05-07T12:00:42+01:00",
        "pastparticiple": null,
        "plural": null,
        "positive": "weird",
        "presentparticiple": null,
        "simplepast": null,
        "simplepresent": null,
        "simplepresentThirdPersonSingular": null,
        "singular": null,
        "superlative": "weirdest",
        "wdLexemeId": "L1064"
      }
    ],
    "nouns": [
      {
        "femininePlural": null,
        "feminineSingular": null,
        "genitive": null,
        "genitivePlural": null,
        "genitiveSingular": null,
        "lastModified": "2024-05-23T20:11:21+01:00",
        "masculinePlural": null,
        "masculineSingular": null,
        "nominative": null,
        "nominativePlural": null,
        "nominativeSingular": null,
        "plural": "movies",
        "pluralSingular": null,
        "singular": "movie",
        "wdLexemeId": "L2043"
      }
    ]
  }
}

Endpoint: api/v1/data/fr

Sample Response (French)

  "language": "fr",
  "data": {
    "adjectives": [
      {
        "comparative": null,
        "femininePlural": "soules",
        "feminineSingular": "soule",
        "lastModified": "2021-06-30T20:37:12+01:00",
        "masculine": null,
        "masculinePlural": "souls",
        "masculineSingular": "sou",
        "masculineSingularComparative": null,
        "plural": null,
        "singular": null,
        "wdLexemeId": "L10007"
      }
    ]
  }
}

Endpoint: api/v1/data-version/en

Response Structure

  "language": "en",
  "versions": {
    "adjectives_last_modified": "2025-07-07",
    "adverbs_last_modified": "2025-07-07",
    "conjunctions_last_modified": "2025-07-07",
    "nouns_last_modified": "2025-07-07",
    "personal_pronouns_last_modified": "2025-07-07",
    "postpositions_last_modified": "2025-07-07",
    "prepositions_last_modified": "2025-07-07",
    "pronouns_last_modified": "2025-07-07",
    "proper_nouns_last_modified": "2025-07-07",
    "verbs_last_modified": "2025-07-07"
  }
}

Endpoint: api/v1/languages

Response Structure

  "languages": [
    {
      "code": "en",
      "data_types": [
        "adjectives",
        "adverbs",
        "conjunctions",
        "nouns",
        "personal_pronouns",
        "postpositions",
        "prepositions",
        "pronouns",
        "proper_nouns",
        "verbs"
      ]
    },
    {
      "code": "fr",
      "data_types": [
        "adjectives",
        "adverbs",
        "articles",
        "conjunctions",
        "nouns",
        "personal_pronouns",
        "prepositions",
        "pronouns",
        "proper_nouns",
        "verbs"
      ]
    }
  ]
}

TO TEST LOCALLY

Copy any of Scribe-Data .sqlite files into Scribe-Server -> Run Migration via make migrate (Install mariadb and setup your database plus your config.yaml file) -> Then run the Server app -> Finally you can test the endpoints.

###⚠️❗️ api/v1/data/:lang endpoint returns a very huge data!


Please let me know if you need more explanation.

Some video demo:

Screen.Recording.2025-07-07.at.12.15.28_720.mov

NEXT STEPS

After this PR, I don't know if we

  1. just go to making Scribe-Server run Scribe-Data locally. Following this plan as stated in the doc above.
    Plan:
`scribe-data g -a -wdp` → SQLite files → Go migration → MariaDB
                                                    ↓
                                    Update language_data_versions table
  1. Add tests before we go to 1
  2. add documentation before we go to 1

Which one?

@axif0 axif0 self-requested a review July 8, 2025 13:31
@axif0
Copy link
Copy Markdown
Member

axif0 commented Jul 10, 2025

Few suggestions that comes in my mind that,

  • Remove OpenAPI packages.
  • Remove sqlc library
  • Can we organize the code for the handler.go? As it has mixed concerns as validation, sanitization, and HTTP handling all in one place. I have a suggetion Like - (feel free to make it better if needed)
api/
├── handlers/
│   ├── language.go        # func like `getLanguageVersion` , `getAvailableLanguages`
│   └── common.go          # Common handler utilities
├── validators/
│   └── language_validator.go # isValidLanguageCode func \
├── database_queries/
│   └── language_validator.go # isValidLanguageCode func 
      ├── connection.go          # Database connection management
      ├── language_queries.go    # Language-specific queries  
      ├── table_operations.go    # Generic table operations
      ├── version_management.go  # Version tracking functionality
      └── utils.go              # Utility functions

├── routes.go              # can add health check endpoints ('/' , hello func)
├── middleware.go          # Middleware (keep existing)

Routes work good in my machine. Nice work @DeleMike 🚀 🔥

@DeleMike
Copy link
Copy Markdown
Collaborator Author

Thanks @axif0!
I will try to restructure the project flow.

What next steps do we take? I have already pointed it out in my previous comments. Do we do 1, 2 or 3? Thanks!

NEXT STEPS

After this PR, I don't know if we

  1. just go to making Scribe-Server run Scribe-Data locally. Following this plan as stated in the doc above.
    Plan:
`scribe-data g -a -wdp` → SQLite files → Go migration → MariaDB
                                                  ↓
                                    Update language_data_versions table
  1. Add tests before we go to 1
  2. add documentation before we go to 1

@axif0 axif0 requested a review from henrikth93 July 11, 2025 16:28
@axif0
Copy link
Copy Markdown
Member

axif0 commented Jul 11, 2025

Works good in my machine. Is it possible to have some review for this PR? @henrikth93 ? 👀

@henrikth93
Copy link
Copy Markdown
Member

Works good in my machine. Is it possible to have some review for this PR? @henrikth93 ? 👀

I can check!

Copy link
Copy Markdown
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving given the call that we're doing right now with @DeleMike, @axif0 and @henrikth93 🎉 Thanks all so much for the amazing work here! 😊

@andrewtavis andrewtavis merged commit 8da2db7 into scribe-org:main Jul 13, 2025
2 checks passed
@DeleMike DeleMike deleted the refactor/http-to-gin branch July 13, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design the API specification for Scribe-Server

4 participants