rasdaemon: Fix for regression in ras_mc_create_table() if some cpus a… #93
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…re offline at the system start
Issues:
Regression in the ras_mc_create_table() if some of the cpus are offline at the system start when run the rasdaemon. This issue is reproducible in ras_mc_create_table() with decode and record non-standard events and reproducible sometimes with ras_mc_create_table() for the standard events.
Also in the multi thread way, there is memory leak in ras_mc_event_opendb() as struct sqlite3_priv *priv and sqlite3 *db allocated/initialized per thread, but stored in the common struct ras_events ras in pthread data, which is shared across the threads.
Reason:
when the system start with some of the cpus are offline and then run the rasdaemon, read_ras_event_all_cpus() exit with error and switch to the multi thread way. However read() in read_ras_event() return error in threads for each of the offline CPUs and does clean up including calling ras_mc_event_closedb().
Since the 'struct ras_events ras' passed in the pthread_data to each of the threads is common, struct sqlite3_priv *priv and sqlite3 *db allocated/ initialized per thread and stored in the common 'struct ras_events ras', are getting overwritten in each ras_mc_event_opendb()(which called from pthread per cpu), result memory leak. Also when ras_mc_event_closedb() is called in the above error case from the threads corresponding to the offline cpus, close the sqlite3 *db and free sqlite3_priv *priv stored in the common 'struct ras_events ras', result regression when accessing priv->db in the ras_mc_create_table() from another context later.
Proposed solution:
In ras_mc_event_opendb(), allocate struct sqlite3_priv *priv, init sqlite3 *db and create tables common for the threads with shared 'struct ras_events ras' based on a reference count and free them in the same way.
Also protect critical code ras_mc_event_opendb() and ras_mc_event_closedb() using mutex in the multi thread case from any regression caused by the thread pre-emption.
Reported-by: Lei Feng fenglei47@h-partners.com