-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve error handling by using error-stack
crate
#104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, after testing your changes it seems that the errors are not getting logged on stdout when they occur so I would suggest making the following changes in order to get this fixed 🙂 .
In file src/bin/websurfx.rs
you will need to change the code to something like this (as shown below):
//! Main module of the application
//!
//! This module contains the main function which handles the logging of the application to the
//! stdout and handles the command line arguments provided and launches the `websurfx` server.
use std::net::TcpListener;
use websurfx::{config_parser::parser::Config, run};
/// The function that launches the main server and registers all the routes of the website.
///
/// # Error
///
/// Returns an error if the port is being used by something else on the system and is not
/// available for being used for other applications.
#[actix_web::main]
async fn main() -> std::io::Result<()> {
// Initialize the parsed config file.
let config = Config::parse().unwrap();
// Initializing logging middleware to log all types of logging events including errors
// and info-level events.
if config.logging || config.debug {
env_logger::init();
}
log::info!("started server on port {}", config.port);
let listener = TcpListener::bind((config.binding_ip_addr.clone(), config.port))?;
run(listener, config)?.await
}
In file src/search_results_handler/aggregator.rs
you will need to change the code to something like this (as shown below):
//! This module provides the functionality to scrape and gathers all the results from the upstream
//! search engines and then removes duplicate results.
use std::{collections::HashMap, time::Duration};
use rand::Rng;
use tokio::join;
use super::{
aggregation_models::{RawSearchResult, SearchResult, SearchResults},
user_agent::random_user_agent,
};
use crate::engines::{duckduckgo, searx};
/// A function that aggregates all the scraped results from the above upstream engines and
/// then removes duplicate results and if two results are found to be from two or more engines
/// then puts their names together to show the results are fetched from these upstream engines
/// and then removes all data from the HashMap and puts into a struct of all results aggregated
/// into a vector and also adds the query used into the struct this is neccessory because
/// otherwise the search bar in search remains empty if searched from the query url
///
/// # Example:
///
/// If you search from the url like `https://127.0.0.1/search?q=huston` then the search bar should
/// contain the word huston and not remain empty.
///
/// # Arguments
///
/// * `query` - Accepts a string to query with the above upstream search engines.
/// * `page` - Accepts an u32 page number.
/// * `random_delay` - Accepts a boolean value to add a random delay before making the request.
///
/// # Error
///
/// Returns an error a reqwest and scraping selector errors if any error occurs in the results
/// function in either `searx` or `duckduckgo` or both otherwise returns a `SearchResults struct`
/// containing appropriate values.
pub async fn aggregate(
query: &str,
page: u32,
random_delay: bool,
debug: bool,
) -> Result<SearchResults, Box<dyn std::error::Error>> {
let user_agent: String = random_user_agent();
let mut result_map: HashMap<String, RawSearchResult> = HashMap::new();
// Add a random delay before making the request.
if random_delay || !debug {
let mut rng = rand::thread_rng();
let delay_secs = rng.gen_range(1..10);
std::thread::sleep(Duration::from_secs(delay_secs));
}
// fetch results from upstream search engines simultaneously/concurrently.
let (ddg_map_results, searx_map_results) = join!(
duckduckgo::results(query, page, &user_agent),
searx::results(query, page, &user_agent)
);
let ddg_map_results: HashMap<String, RawSearchResult> = match ddg_map_results {
Ok(result) => result,
Err(e) => {
if debug {
// Log the verbose error with all the information if debug option has been enabled
// in the config file.
log::error!("\n{e:?}");
}
HashMap::new()
}
};
let searx_map_results: HashMap<String, RawSearchResult> = match searx_map_results {
Ok(result) => result,
Err(e) => {
if debug {
// Log the verbose error with all the information if debug option has been enabled
// in the config file.
log::error!("\n{e:?}");
}
HashMap::new()
}
};
result_map.extend(ddg_map_results);
searx_map_results.into_iter().for_each(|(key, value)| {
result_map
.entry(key)
.and_modify(|result| {
result.add_engines(value.clone().engine());
})
.or_insert_with(|| -> RawSearchResult {
RawSearchResult::new(
value.title.clone(),
value.visiting_url.clone(),
value.description.clone(),
value.engine.clone(),
)
});
});
Ok(SearchResults::new(
result_map
.into_iter()
.map(|(key, value)| {
SearchResult::new(
value.title,
value.visiting_url,
key,
value.description,
value.engine,
)
})
.collect(),
query.to_string(),
))
}
Now, you can test the app as usual, one way to test the error is to open your browser and provide a large page size to the query and see if it returns the expected results in the log.
http://127.0.0.1:8080/search?q=ls&page=10000000
Co-authored-by: neon_arch <mustafadhuleb53@gmail.com>
What is the reason for modifying the code in |
The reason to modify the code was to allow the env_logger to also include |
I think that setting the log level to |
Ok, I see 👀 but just get things right I will just test it one last time and see if it works because you know we need to make sure that it works right on all systems including ours. So just give me a few seconds if everything is working as expected then I will approve this PR because then it will be all good and ready to be merged 🙂 , What do you say?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok after testing the changes one last time 🙂 I found that indeed as you (@xffxff) suggested it works. It looks like it was a general misconfiguration of the system on my side and now since everything works as expected and the changes look perfect so I am approving this and I am merging this PR 🙂 Also, thanks for putting up this PR I really appreciate it 🙂 .
What does this PR do?
This PR improves the errors generated while fetching the results from upstream search engines making them more readable and providing more context to the errors and also makes the logs more verbose when debug option is turned on as proposed in issue #83.
Why is this change important?
This change is essential as it improves the readability of errors and gives errors more context when debugging the app which allows a better in-depth understanding of the problem/issue/bug.
How to test this PR locally?
it can be tested by installing and running
Websurfx
as mentioned in the docs and on the Readme and by launching the browser and providing a query with a large page value.Related issues
close #99 and close #83
Aditional Notes
Additionally, once this PR is merged, we can create a new task for each engine search using
tokio::spawn
. As pointed out by @neon-mmd in this comment, we were unable to usetokio::spawn
, because thescraper::error::SelectorErrorKind
error type returned by theresult
function does not implementSync
andSend
. However, we have now addressed this issue by eliminatingSelectorErrorKind
and returning anUnexpectedError
instead. Consequently, all errors returned by theresult
function are now thread-safe.⏎