Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

candle transformation is very slow #60

Closed
pavlexander opened this issue Sep 10, 2018 · 4 comments
Closed

candle transformation is very slow #60

pavlexander opened this issue Sep 10, 2018 · 4 comments
Assignees

Comments

@pavlexander
Copy link

pavlexander commented Sep 10, 2018

I have encountered a situation with a big set of data, that transform function is very slow and needs improvements.

  • start set of data: 385,577
  • start interval: 1 Minute
  • end set of data: 3,243 (without fix: 3,221)
  • end interval: 2 Hours

Default time taken to execute: 150,000 ms
Time taken to execute after optimizations: 29,000 ms

Code:

IEnumerable<IOhlcv> ienumCandles = tradyCandles.AsEnumerable();
var transformedTradyCandles = ienumCandles.Transform<PerMinute, BiHourly>();

Before:

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
	var candle = candles.Where(c => c.DateTime >= startTime && c.DateTime < endTime);
	if (candle.Any())
	{
		var dateTime = candle.First().DateTime;
		var open = candle.First().Open;
		var high = candle.Max(stick => stick.High);
		var low = candle.Min(stick => stick.Low);
		var close = candle.Last().Close;
		var volume = candle.Sum(stick => stick.Volume);
		return new Candle(dateTime, open, high, low, close, volume);
	}
	return null;
}

After:

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
	List<IOhlcv> inputCandles = candles.ToList();
	List<IOhlcv> candle = new List<IOhlcv>();
	bool started = false;
	for (int i = 0; i < candles.Count(); i++)
	{
		if (!started && inputCandles[i].DateTime >= startTime)
		{
			started = true;
			candle.Add(inputCandles[i]);
		}
		else if (started)
		{
			if (inputCandles[i].DateTime >= endTime)
			{
				break;
			}

			candle.Add(inputCandles[i]);
		}
	}

	if (candle.Any())
	{
		var dateTime = candle.First().DateTime;
		var open = candle.First().Open;
		var high = candle.Max(stick => stick.High);
		var low = candle.Min(stick => stick.Low);
		var close = candle.Last().Close;
		var volume = candle.Sum(stick => stick.Volume);
		return new Candle(dateTime, open, high, low, close, volume);
	}
	return null;
}

While I do see, that there is some difference in output result (I don't have time now, to study what exactly is wrong) - the improvement is seen with a naked eye.

The only requirement is that data needs to be ordered. It can be done as the first step using LINQ. In my set of data all candles were in correct order so I skipped that part.

Additionally, I think transformation can be performed in parallel for different date-time periods.

I would appreciate if you investigated this issue more deeply and came up with some code optimizations.

P.S. Yes, according to profiler ComputeIOhlcvData takes the most time to execute. I think that the reason for that is that LINQ is iterating over all records over and over again. In the best case scenario (if data is ordered) - we need to do it only once!!! (in default implementation the code does it at least 3,000 times) In my new implementation, code also does it 3,000 time, but, it quits iterating over candles as soon as first out-of range candle is encountered. It gave 5x performance boost. I imagine that completely-rewritten transformation will get the job done in at most 1/5 time than my result..

@pavlexander
Copy link
Author

Here, I have made an example, which will ONLY work for 1Minute to 2Hours transformation.

  • Default time taken to execute: 150,000 ms (records: 3,221)
  • Time taken to execute after optimizations Nr.1: 29,000 ms (records: 3,243)
  • Time taken to execute after optimizations Nr.2: 132 ms (records: 3,242)

Old code:

public static IReadOnlyList<IOhlcv> Transform<TSourcePeriod, TTargetPeriod>(this IEnumerable<IOhlcv> candles)
	where TSourcePeriod : IPeriod
	where TTargetPeriod : IPeriod
{
	if (typeof(TSourcePeriod).Equals(typeof(TTargetPeriod)))
		return candles.ToList();

	if (!IsTimeframesValid<TSourcePeriod>(candles, out IOhlcv err))
		throw new InvalidTimeframeException(err.DateTime);

	if (!IsTransformationValid<TSourcePeriod, TTargetPeriod>())
		throw new InvalidTransformationException(typeof(TSourcePeriod), typeof(TTargetPeriod));

	var outIOhlcvDatas = new List<IOhlcv>();
	var periodInstance = Activator.CreateInstance<TTargetPeriod>();
	
	var startTime = candles.First().DateTime;
	while (startTime <= candles.Last().DateTime)
	{
		var nextStartTime = periodInstance.NextTimestamp(startTime);
		if (periodInstance.IsTimestamp(startTime))
		{
			var outIOhlcvData = ComputeIOhlcvData(candles, startTime, nextStartTime);
			if (outIOhlcvData != null)
				outIOhlcvDatas.Add(outIOhlcvData);
		}
		startTime = nextStartTime;
	}

	return outIOhlcvDatas;
}

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
	var candle = candles.Where(c => c.DateTime >= startTime && c.DateTime < endTime);

	if (candle.Any())
	{
		var dateTime = candle.First().DateTime;
		var open = candle.First().Open;
		var high = candle.Max(stick => stick.High);
		var low = candle.Min(stick => stick.Low);
		var close = candle.Last().Close;
		var volume = candle.Sum(stick => stick.Volume);
		return new Candle(dateTime, open, high, low, close, volume);
	}
	return null;
}

New code (optimizations Nr. 2)

public static IReadOnlyList<IOhlcv> Transform<TSourcePeriod, TTargetPeriod>(this IEnumerable<IOhlcv> candles)
	where TSourcePeriod : IPeriod
	where TTargetPeriod : IPeriod
{
	if (typeof(TSourcePeriod).Equals(typeof(TTargetPeriod)))
		return candles.ToList();

	if (!IsTimeframesValid<TSourcePeriod>(candles, out IOhlcv err))
		throw new InvalidTimeframeException(err.DateTime);

	if (!IsTransformationValid<TSourcePeriod, TTargetPeriod>())
		throw new InvalidTransformationException(typeof(TSourcePeriod), typeof(TTargetPeriod));

	var outIOhlcvDatas = new List<IOhlcv>();
	
	var startTime = candles.First().DateTime;
	var periodStart = GetPeriodStart(startTime);
	var periodEnd = periodStart.AddHours(2);
	List<IOhlcv> inputCandles = candles.ToList();
	List<IOhlcv> candle = new List<IOhlcv>();
	for (int i = 0; i < inputCandles.Count(); i++)
	{
		var dt = inputCandles[i].DateTime;
		if (dt >= periodEnd)
		{
			periodStart = periodEnd;
			periodEnd = periodStart.AddHours(2);

			ComputeIOhlcvDataPreprocessed(outIOhlcvDatas, candle);
			candle = new List<IOhlcv>();
		}

		candle.Add(inputCandles[i]);
	}

	if (candle.Any())
	{
		ComputeIOhlcvDataPreprocessed(outIOhlcvDatas, candle);
		candle = null;
	}

	return outIOhlcvDatas;
}

private static DateTimeOffset GetPeriodStart(DateTimeOffset dt)
{
	return new DateTimeOffset(dt.Year, dt.Month, dt.Day, dt.Hour, 0, 0, dt.Offset);
}

private static void ComputeIOhlcvDataPreprocessed(List<IOhlcv> result, IEnumerable<IOhlcv> candle)
{
	if (candle.Any())
	{
		var dateTime = candle.First().DateTime;
		var open = candle.First().Open;
		var high = candle.Max(stick => stick.High);
		var low = candle.Min(stick => stick.Low);
		var close = candle.Last().Close;
		var volume = candle.Sum(stick => stick.Volume);

		result.Add(new Candle(dateTime, open, high, low, close, volume));
	}
}

And again, I will repeat myself - the code is done for demo purposes. It has some hard-coded periods. You are free to modify it to make a universal solution out of that.

It's just to show that there is a huge room for improvement.

@karlwancl
Copy link
Owner

@pavlexander Thanks for the investigation & the optimized code. You're right with the 1st optimization, LINQ is lazy-evaluated, the Where clause have been evaluated every Any(), First(), Last(), Sum(), Min(), Max() operation.

A traditional for-loop (your approach) to create a new list is also a solver to this problem. I think ToList() after the Where clause may also work like the traditional loop but i have to verify.

You're also right with the 2nd optimization, the Where clause in each while loop is too costly, using your approach (iterate candles through the list) is better.

The part will be updated maybe in this weekend. Again, thanks for your contribution :D

@karlwancl
Copy link
Owner

@pavlexander The code is patched. Please see if the performance issue has fixed 😃

@pavlexander
Copy link
Author

@lppkarl yes, I can verify that now it's 437x faster than previous implementation :)

Thank you for taking care of this issue and of course, for Trady! You are doing a great job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants