Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale setting can be ignored #2563

Closed
hlascelles opened this issue Sep 15, 2022 · 32 comments · Fixed by #2802 or #2734
Closed

Locale setting can be ignored #2563

hlascelles opened this issue Sep 15, 2022 · 32 comments · Fixed by #2802 or #2734
Assignees

Comments

@hlascelles
Copy link

hlascelles commented Sep 15, 2022

The bug

Depending on how locale is set, it may be ignored in threaded server environments.

In particular, we use Faker at runtime in a deployed environment (using Puma) where a user can press a button and the server generates them fake data in a form.

To Reproduce

  1. Set up a Rails app with a threaded server (eg Puma)
  2. In a lib file, or an initializer, add a locale setting line:
    Faker::Config.locale = :'en-GB'
  3. Make a request to the server for fake data.
  4. You will receive fake data for the default locale :en, which is American (eg Zip codes) not :en-GB (eg Postcodes)

Cause

The cause is the new threaded safety: #2520 introduced in Faker 2.23.0.

Puma works by preloading the app, then process forking it (and also maybe generating new server threads). From the above reproduction steps, you can see that the Faker locale will be set in the ThreadLocal of the master process Thread.current[:faker_config_locale]. However, all the threaded child processes do not inherit the ThreadLocal value, so it is blank, and defaults to :en.

Reverting to Faker 2.22.0 solves the issue.

Possible solution

When the locale is set for the first time, it could become the new default instead of the logic here: https://github.com/faker-ruby/faker/blob/master/lib/faker.rb#L23

Open to other ideas!

@JuanVqz
Copy link

JuanVqz commented Oct 7, 2022

Hey Stefanni!!
I’d like to solve it but I want to understand it first, do you suggest reverting the changes in the faker.rb file from this PR will solve the issue?
If so, I can do it.

@fbuys
Copy link
Contributor

fbuys commented Oct 7, 2022

Hey @JuanVqz I am not totally familiar with threads. But what I can gather from the above is that we want to keep the threaded behavior but we want to make sure that the locale set in the config is shared between all threads.
I'll need to learn a bit more about threads to be more useful, but does this information help?

@JuanVqz
Copy link

JuanVqz commented Oct 7, 2022

Hey @JuanVqz I am not totally familiar with threads. But what I can gather from the above is that we want to keep the threaded behavior but we want to make sure that the locale set in the config is shared between all threads. I'll need to learn a bit more about threads to be more useful, but does this information help?

Hey @fbuys that makes sense, I'm not familiar with threads as well.

@sudeeptarlekar
Copy link
Contributor

But are we suppose to use faker on production? It is meant to be used for development and testing ENVs. Also Ruby3.1.2 has fiber which is similar to threads, worth taking a look.

@thdaraujo
Copy link
Contributor

Well, if people were already using it in production, we still want to support that behavior. Besides, they could also be running tests that match closely the behavior of their apps running in production (using Puma, threads, and everything else), so in my mind, it makes sense to support that.

@hbontempo-br
Copy link
Contributor

Hi!

Not very familiar with the inner workings of project yet, but seams like dry-configurable would help here.
Are you OK in bringing in a new dependency?

Not sure how to test this threaded behavior, but probably can put something together this weekend if you think this a good direction.

@hlascelles
Copy link
Author

I'll have a look at some solutions too.

NB, we aren't using it in the production, just QA - though others might be.

@thdaraujo
Copy link
Contributor

thdaraujo commented Oct 18, 2022

We were exploring some ideas here: hexdevs@7987810

The code kinda works. But it's a bit terrible, and it's not a long term solution.

I think a long term solution would be a configure method where you set the locale and other configurations you want to be used across threads. Which is close to what @hbontempo-br is proposing.

The nice thing about it is that Rails users could run it as an initializer, and it would solve the problem nicely.

It would look like this:

# on Rails, this would live in `/config/initializers/faker.rb` 

Faker.configure do |config|
  config.locale = :en
  (...)
end

# or maybe keep it inside the config module:
Faker::Config.setup do |config|
  (...)
end

Then the locale set on configure would be the new default locale, but different threads would still be able to overwrite it locally.

The tricky part is making this backwards compatible. But I think it's feasible. What do you think?

@mateusdeap
Copy link
Contributor

mateusdeap commented Oct 18, 2022

So, I did some research and found out that instance variables are shared between threads (read here).

Which explains why the information is shared in the previous implementation, and thus your use case worked, but it also still means that Faker wouldn't be thread safe.

So it seems like the solution would be a combination of the previous implementation and the current one using Thread

Just saw @thdaraujo's answer.

I could try and look into the configure method idea too.

@stefannibrasil
Copy link

I'm in favor of going with having a Faker configuration option. This is the standard for most gems already, and would solve the issues with threads.

@mateusdeap are you working on this? Could I assign this issue to you? Thanks!

@elliotcm
Copy link

elliotcm commented Nov 9, 2022

A +1 from us at https://github.com/DFE-Digital/apply-for-teacher-training as this is preventing us from moving beyond v2.22.0 of Faker.

Our "production" use is to generate data in a sandbox environment, and to generate temporary data for ActionMailer previews.

Happy to chip in if we can be of help.

@mateusdeap
Copy link
Contributor

Hey, @stefannibrasil sure thing! Sorry for the late response. Kind of got lost in work. But now I have time!

@stefannibrasil
Copy link

@mateusdeap I assigned the issue to you. Please let us know if you have any questions, or run into any issues.

@mateusdeap
Copy link
Contributor

Sure thing!

@mateusdeap
Copy link
Contributor

Just to update on this: I've made some progress, though not substantial. I'll need to pause work for a few days, however. But I'll be back on it in a week or so.

@stefannibrasil
Copy link

Sounds good! Let us know if you need any help.

@mateusdeap
Copy link
Contributor

Sure thing. I'm back from vacations now and will be working on this.

@mateusdeap
Copy link
Contributor

So after banging my head thoroughly against the Thread docs, I'm not sure how to write our little thread safe config.

As I understood the OPs original comment, Puma can actually fork new processes? Although I think I read somewhere that the Ruby VM (MRI) has a Global Instance Lock, so it doesn't actually fork new processes, and the Thread class just allows us to create different threads, but it's still all just one process... I'm I missing something?

That being said, I'll post in my next comment what I had imagined for our config setup, although I'm not sure if this would solve the issue with sharing the same info among all threads.

@mateusdeap
Copy link
Contributor

Ok, so my next comment on how I imagined this will be a little bit longer. Currently having issues figuring out how the test suite works. It's a mix of minitest and something else? I'm not understanding how to run just one test file at a time, basically. I was able to do it using just ruby -Ilib path/to/my/test.rb, but I don't get the prettier output of using the rake task....

@hlascelles
Copy link
Author

Thanks for looking at this @mateusdeap...

Puma can serve multiple requests in two ways (and it is recommended to use both).

  1. Process forking. Basically Rails "starts", then multiple processes are forked off which are all (initially) replicas. They no longer alter each others memory after that. If you add something to an in-memory cache, or change a global variable, the others will not be aware of it.
  2. Threading. In each process (above) Puma can spin up multiple threads. These are all in the same process, and can talk to each other and share memory. Global variable changes are shared.

More discussion about it is here: https://stackoverflow.com/questions/24280743/what-is-the-difference-between-workers-and-threads-in-puma.

I'd say the simplest solution may be for there to be a non-Threadsafe "default locale" which can be set during Rails startup (before any forking). Maybe then you could (in tests) set a thread local one... And the "lookup" code internally in faker would check the Thread local one, then if not found fall back to the global one?

@mateusdeap
Copy link
Contributor

I have some interesting findings to report.

After some more testing and head scratching, I think I understood Threads in ruby.

The gist of it is this:

  1. Puma can fork new processes, via sys calls, and it can create new ruby threads (using Thread.new). In both those cases, I don't think ruby has any way to let us share data between separate threads.
  2. The one case that, as per the Thread class docs, we can access variables defined in a given thread is either directly in that thread's scope or, at most, in the scope of a Fiber started in the given Thread.

To confirm this I tried the following test:

  def test_default_locale_is_thread_global
    Thread.new do
      Faker.configure do |config|
        config.locale = :pt
      end
      p Faker.configuration.locale
      assert_equal :pt, Faker.configuration.locale
      Thread.new do
        p Faker.configuration.locale
        assert_equal :en, Faker.configuration.locale
      end.join
    end.join
  end

The print statements were for debugging. But actually stepping through this code, I am unable to see the variables set in the scope of the first Thread from within the second Thread I start, so if puma does that, I don't think it is possible to share the locale. Unless of course I missed something, so I'm open to feedback or ideas.

@mateusdeap
Copy link
Contributor

mateusdeap commented Dec 15, 2022

Another thing I found out was that the previous thread safe implementation, by storing variables using Thread#[]=, that is actually setting fiber-local variables. In other words, if we do that at start up and puma creates a new fiber, the variable is not visible within the fiber's scope.

To have the variable be accessible in that case, we need to use thread_variable_set and thread_variable_get. I could try and swap those to see if it would work for your use case, @hlascelles, but, again, as long as puma does things in a way that doesn't put you in completely different thread scope...

@mateusdeap
Copy link
Contributor

Again, open to any input and other info.

Also, concerning the whole idea of being able to configurate faker using something like:

Faker.configuration do |config|
end

I was able to make it work, but it's definitely a breaking change, even for the tests. At least the way I found on how to do it, which would require making the Config module a class. I'm still going to look into it and se if it can't be done with a module in a backwards compatible way.

@mateusdeap
Copy link
Contributor

Now that I look at my comments and reread your comment, @hlascelles , I think I completely missed the point and just focused on having Faker be thread safe and work for what you want. Will attempt using some form of global setting. But I think we'd might still be plagued by some thread puma starts not having the right settings.

@stefannibrasil
Copy link

I think that @thdaraujo has some thoughts about how to add a config option without adding a breaking change.

@thdaraujo
Copy link
Contributor

thdaraujo commented Jan 17, 2023

(sorry for the late reply!)

Thank you so much for looking into it, @mateusdeap! I think you're right, this is a trickier problem than I initially imagined.

I'm not sure yet how to solve this. I think a global setting could be a good direction, but I need to do a bit more research. We are going to talk with Nate on Friday, so he might have some ideas to share.

The other point to consider is that on Puma, a given request might be executed by an existing thread from the ThreadPool and the locale setting will "leak" to this new request. Which makes sense, but I'm not sure about the implications, or if we can even do something about it. (see request_store)

I hope to have a bit more info to share next week.

This issue on i18n seems very similar to our current problem, so maybe we can learn something from it.

@nateberkopec
Copy link

Does Faker 2.23 work correctly in Puma's single mode (workers = 0) with multiple threads (threads 2,2)?

I'm guessing it does not.

@nateberkopec
Copy link

nateberkopec commented Jan 20, 2023

I'm pretty sure the only mistake in #2520 was removing the locale class variable. Instead, it should have added the fiber-local as the new, first source of truth and fell back to @locale if that fiber local is not set.

In essence,

def locale
  Thread.current[:faker_config_locale] || @locale || (I18n.available_locales.include?(I18n.locale) ? I18n.locale : I18n.available_locales.first)
end

As for setting the locale, that's a bit trickier. I don't understand the desired behavior, but I think what you want is to set a "default" locale, and then be able to set the locale again on a per-thread basis. So you probably want a configuration option which sets the default locale in a non-threadsafe way (i.e. changing the @locale class variable) and then an API which sets the locale for the current thread.

@stefannibrasil
Copy link

thank you @nateberkopec. This one is tricky, and I appreciate your help!

@mateusdeap are you interested in opening a PR following what Nate has shared?

@stefannibrasil
Copy link

Quick reminder: once this issue is solved, we also need to update the README. There is a note about this issue there.

@mateusdeap
Copy link
Contributor

@stefannibrasil Will do next week!

@mateusdeap
Copy link
Contributor

@stefannibrasil More like a month rather than a week.

freesteph added a commit to betagouv/aplypro-mock-data that referenced this issue May 22, 2023
We're hitting a Faker bug[1] that doesn't allow setting the locale in a
threaded environment (like Puma) so pin the version that does until
then.

[1]: faker-ruby/faker#2563
This was linked to pull requests Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants