Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update code to support TA - Tamil language #33

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
101 changes: 101 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,104 @@ cargo run -- <args>
```

where `<args>` are the command-line arguments you would pass the DidYouMean binary. Note that this is an unoptimized build contianing debug information so it runs much, much slower.
Adding New Language
===================
Update the language code in the list at


Update the language wordlist
============================
```
$ cargo run -- --lang ta --update-langs --wordlist-url https://raw.githubusercontent.com/arcturusannamalai/wordlists/main/
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
Running `target/debug/dym --lang ta --update-langs --wordlist-url 'https://raw.githubusercontent.com/arcturusannamalai/wordlists/main/'`
Downloading English word list...
Accessing URL: https://raw.githubusercontent.com/arcturusannamalai/wordlists/main//en
[00:00:00] [############################################################################################################################] 4.12MiB/4.12MiB (0s)
Downloading Tamil word list...
Accessing URL: https://raw.githubusercontent.com/arcturusannamalai/wordlists/main//ta
[00:00:00] [############################################################################################################################] 1.73MiB/1.73MiB (0s)
```

Run did you mean:
================
e.g.
```
cargo run -- --lang ta கலஅ
Compiling didyoumean v1.1.4 (/Users/user/devel/rust-in-action/didyoumean)
Finished dev [unoptimized + debuginfo] target(s) in 1.18s
Running `target/debug/dym --lang ta 'கலஅ'`
Did you mean?
1. கலி
2. கலை
3. கல்
4. அ
5. அகல்

```


For more info see the help text and options,
```
$ cargo run -- --help

didyoumean user$ cargo run -- --help
Compiling didyoumean v1.1.4 (/Users/user/devel/rust-in-action/didyoumean)
Finished dev [unoptimized + debuginfo] target(s) in 2.24s
Running `target/debug/dym --help`
didyoumean 1.1.4
Hisbaan Noorani
Did You Mean: A cli spelling corrector

USAGE:
dym [OPTIONS] [SEARCH_TERM]

ARGS:
<SEARCH_TERM>


OPTIONS:
-c, --clean-output
Print a clean version of the output without the title, numbers or colour.

-h, --help
Print help information

-l, --lang <LANG>
Select the desired language using its locale code. For example, English would have the
locale code en and French would have the locale code fr. See --print-langs for a list of
locale codes and the corresponding languages.

[default: en]

-n, --number <NUMBER>
Change the number of words the program will print. The default value is five.

[default: 5]

--print-langs
Display a list of supported languages and their respective locale codes.

--update-langs
Update all language files from the repository specified by CLI @wordlist-url@.

-v, --verbose
Print verbose output including the edit distance of the found word to the queried word.

-V, --version
Print version information

-w, --wordlist-url <WORDLIST_URL>
Wordlist repository URL. The default value is
'https://raw.githubusercontent.com/hisbaan/wordlists/main'

[default: https://raw.githubusercontent.com/hisbaan/wordlists/main]

-y, --yank
Yank (copy) the selected word to the system clipboard. If no word is selected, the
clipboard will not be altered.

```



Comment on lines +69 to +169
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these changes to the README.

  1. The information is not needed in the README. Anyone can access it via the help command or the man page where it is better formatted and more general.
  2. This information is not well formatted. You use specific to Tamil commands to showcase functionality, these (if it was to be included) should be general commands with placeholders for specific values. Also, when showing these off they should generally be using the binary to run these commands and

10 changes: 9 additions & 1 deletion src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,15 @@ pub struct Cli {
#[clap(
long = "update-langs",
help = "Update all language files",
long_help = "Update all language files from the repository https://github.com/hisbaan/wordlists."
long_help = "Update all language files from the repository specified by CLI @wordlist-url@."
)]
pub update_langs: bool,
#[clap(
short = 'w',
long = "wordlist-url",
help = "Wordlist repository URL",
long_help = "Wordlist repository URL. The default value is 'https://raw.githubusercontent.com/hisbaan/wordlists/main'",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't list the default value here, it's already shown below and output by the help menu

For example:

image

Also relating to this, I found a place where I had made this same mistake, so if possible could you please also correct this? (line 13 of cli.rs please remove "The default value is 5")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust has a sensible affordance here for CLI help to also print the default; this is not the case in C++, Python @ argparse etc. nice to know!

default_value = "https://raw.githubusercontent.com/hisbaan/wordlists/main"
)]
pub wordlist_url: String,
}
2 changes: 2 additions & 0 deletions src/langs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ pub static LOCALES: phf::Map<&'static str, &'static str> = phf_map! {
"st" => "Sesotho",
"sv" => "Swedish",
"sw" => "Swahili",
"ta" => "Tamil",
"tg" => "Tajik",
"th" => "Thai",
"tk" => "Turkmen",
Expand Down Expand Up @@ -129,6 +130,7 @@ pub static SUPPORTED_LANGS: phf::Map<&'static str, &'static str> = phf_map! {
"st" => "Sesotho",
"sv" => "Swedish",
"sw" => "Swahili",
"ta" => "Tamil",
"tg" => "Tajik",
"tk" => "Turkmen",
"tl" => "Tagalog",
Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub fn yank(string: &str) {
"freebsd",
"netbsd",
"dragonfly",
"mac osx",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason "mac osx" was not on this list before is that (as far as I know) it handles clipboard functionality just fine without this hack. This is not a "supported operating systems" list but instead a list of operating systems for which a certain workaround is required. Please remove it from this list.

"netbsd",
"openbsd",
"solaris",
Expand Down
24 changes: 12 additions & 12 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,11 @@ fn run_app() -> std::result::Result<(), Error> {

// Update all downloaded languages.
if args.update_langs {
update_langs();
assert!(
!args.wordlist_url.ends_with("/"),
"URL should end with branch name in github without trailing /"
);
Comment on lines +61 to +64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but please use the proper error handling instead of just asserting. You can see an example a little below in the code or right here:

            let error = cmd.error(
                clap::ErrorKind::MissingRequiredArgument,
                format!(
                    "The {} argument was not provided.\n\n\tEither provide it as an argument or pass it in from standard input.",
                    "<SEARCH_TERM>".green()
                )
            );
            clap::Error::exit(&error);

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also note that the repository doesn't necessarily have to be on GitHub so the error message should be something like

The provided wordlist URL must not end with a trailing '/'

update_langs(args.wordlist_url);
std::process::exit(0);
}

Expand Down Expand Up @@ -88,7 +92,7 @@ fn run_app() -> std::result::Result<(), Error> {
}

if SUPPORTED_LANGS.contains_key(args.lang.as_str()) {
fetch_word_list(args.lang.to_owned());
fetch_word_list(args.lang.to_owned(), args.wordlist_url.to_owned());
} else {
// Not supported
// Initialize new command.
Expand All @@ -110,14 +114,13 @@ fn run_app() -> std::result::Result<(), Error> {
// Exit with error.
clap::Error::exit(&error);
}

// Get word list. The program will only get here if/when this is a valid word list.
let word_list = read_to_string(dirs::data_dir().unwrap().join("didyoumean").join(args.lang))
.expect("Error reading file");

// Get dictionary of words from words.txt.
let dictionary = word_list.split('\n');

//assert!(dictionary.clone().count()>20,"Size of wordlist > 20 words");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was used in debugging? Please remove this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still new to Rust as you can see from the clone-ing. Sorry.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I really appreciate the effort and desire to make software better :)

// Create mutable vecs for storing the top n words.
let mut top_n_words = vec![""; args.number];
let mut top_n_dists = vec![search_term.len() * 10; args.number];
Expand Down Expand Up @@ -224,7 +227,7 @@ fn run_app() -> std::result::Result<(), Error> {
///
/// * `lang` - A locale code string to define the word list file to fetch.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment string here with the new parameter.

#[tokio::main]
async fn fetch_word_list(lang: String) {
async fn fetch_word_list(lang: String, wordlist_url: String) {
// Get data directory.
let data_dir = dirs::data_dir().unwrap().join("didyoumean");

Expand All @@ -243,11 +246,8 @@ async fn fetch_word_list(lang: String) {
LOCALES.get(&lang).unwrap().to_string().blue()
);

let url = format!(
"https://raw.githubusercontent.com/hisbaan/wordlists/main/{}",
&lang
);

let url = format!("{}/{}", wordlist_url, &lang);
println!("Accessing URL: {}", url);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra print is not required as it adds extra clutter and doesn't give any new information to the user since they know the URL they added.

// Setup reqwest.
let response = get(&url).await.expect("Request failed");
let total_size = response.content_length().unwrap();
Expand Down Expand Up @@ -280,7 +280,7 @@ async fn fetch_word_list(lang: String) {
}

/// Update the word list files by deleting and downloading the files from the repository.
fn update_langs() {
fn update_langs(wordlist_url: String) {
let data = data_dir().unwrap().join("didyoumean");

// Create data directory if it doesn't exist.
Expand All @@ -299,7 +299,7 @@ fn update_langs() {
// Only delete and download if the language is supported.
if SUPPORTED_LANGS.contains_key(string) {
remove_file(data.join(&string)).expect("Failed to update file (deletion failed)");
fetch_word_list(string.to_string());
fetch_word_list(string.to_string(), wordlist_url.to_string());
}
}
}