Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed issue of precision-recall quality control layer #456

Closed
MaratKhabibullin opened this issue Sep 29, 2021 · 5 comments · Fixed by #533
Closed

Speed issue of precision-recall quality control layer #456

MaratKhabibullin opened this issue Sep 29, 2021 · 5 comments · Fixed by #533

Comments

@MaratKhabibullin
Copy link

Please implement calculation of positivesCorrect, positivesTotal, negativesCorrect, negativesTotal on device side.

void CPrecisionRecallLayer::RunOnceAfterReset()
{
CPtr<CDnnBlob> inputBlob = inputBlobs[0];
CPtr<CDnnBlob> expectedLabelsBlob = inputBlobs[1];
CArray<float> labels;
labels.SetSize( expectedLabelsBlob->GetObjectCount() );
expectedLabelsBlob->CopyTo( labels.GetPtr(), labels.Size() );
CArray<float> networkOutputs;
networkOutputs.SetSize( inputBlob->GetObjectCount() );
inputBlob->CopyTo( networkOutputs.GetPtr(), networkOutputs.Size() );
for( int i = 0; i < inputBlob->GetObjectCount(); i++ ) {
if( labels[i] > 0 ) {
if( networkOutputs[i] >= 0 ) {
positivesCorrect++;
}
positivesTotal++;
} else {
if( networkOutputs[i] < 0 ) {
negativesCorrect++;
}
negativesTotal++;
}
}

@MaratKhabibullin
Copy link
Author

Example:

	
	CConstFloatHandle groundtruth = inputBlobs[1]->GetData();

	CConstFloatHandle notIgnoredSegmentsMask = inputBlobs[2]->GetData();

	const int vectorSize = inputBlobs[0]->GetDataSize();

	CFloatHandleStackVar minusOne( MathEngine() );
	minusOne.SetValue( -1.f );
	CFloatHandleStackVar zero( MathEngine() );
	zero.SetValue( 0.0f );
	CFloatHandleStackVar ones( MathEngine(), vectorSize );
	MathEngine().VectorFill( ones, 1.0f, vectorSize );
	
	//маска элементов классифицированных как положительный класс.
	//Вычисляем по порогу 0.5; это значит, что нужно посмотреть
	//на знак: у положительного класса он положительный.
	CFloatHandleStackVar binarizedCalculation( MathEngine(), vectorSize );
	MathEngine().VectorReLUDiff( calculatedLogit, ones, binarizedCalculation, vectorSize, zero );

	//маска истинно положительных элементов
	CFloatHandleStackVar binarizedLabel( MathEngine(), vectorSize );
	MathEngine().VectorReLUDiff( groundtruth, ones, binarizedLabel, vectorSize, zero );
	
	//маска правильно классифицированных положительных элементов
	CFloatHandleStackVar positiveMask( MathEngine(), vectorSize );
	//там, где в обоих вектора стоит 1, будет 1; иначе - 0
	MathEngine().VectorEltwiseMin( binarizedLabel, binarizedCalculation, positiveMask, vectorSize );
	//учитываем только неигнорируемые сегменты
	MathEngine().VectorEltwiseMultiply( positiveMask, notIgnoredSegmentsMask, positiveMask, vectorSize );

	//кол-во верно классифицированных положительных элементов
	CFloatHandleStackVar positiveCorrectTotal( MathEngine() );
	MathEngine().VectorSum( positiveMask, vectorSize, positiveCorrectTotal );

	//общее кол-во элементов положительного класса
	CFloatHandleStackVar positivesTotal( MathEngine(), 1 );
	CFloatHandleStackVar temp( MathEngine(), vectorSize );
	MathEngine().VectorCopy( temp, binarizedLabel, vectorSize );
	//учитываем только неигнорируемые сегменты
	MathEngine().VectorEltwiseMultiply( temp, notIgnoredSegmentsMask, temp, vectorSize );
	MathEngine().VectorSum( temp, vectorSize, positivesTotal );

	//маска правильно классифицированных отрицательных элементов.
	CFloatHandleStackVar negativesCorrect( MathEngine(), vectorSize );
	//0 там, где в обоих векторах нули; 1 - иначе.
	MathEngine().VectorEltwiseMax( binarizedLabel, binarizedCalculation, negativesCorrect, vectorSize );
	//сейчас правильно классифицированные отрицательные элементы имеют значение 0.
	//Инвертируем вектор.
	//{0, 1} -> {-1, 0}
	MathEngine().VectorAddValue( negativesCorrect, negativesCorrect, vectorSize, minusOne );
	//{-1, 0} -> {0, 1}
	MathEngine().VectorAbs( negativesCorrect, negativesCorrect, vectorSize );
	//учитываем только неигнорируемые сегменты
	MathEngine().VectorEltwiseMultiply( negativesCorrect, notIgnoredSegmentsMask, negativesCorrect, vectorSize );

	//кол-во верно классифицированных отрицательных элементов
	CFloatHandleStackVar negativesCorrectTotal( MathEngine() );
	MathEngine().VectorSum( negativesCorrect, vectorSize, negativesCorrectTotal );

	//общее кол-во элементов отрицательного класса
	CFloatHandleStackVar negativesTotal( MathEngine() );
	//сейчас отрицательные элементы имеют значение 0.
	//Инвертируем вектор.
	//{0, 1} -> {-1, 0}
	MathEngine().VectorAddValue( binarizedLabel, binarizedLabel, vectorSize, minusOne );
	//{-1, 0} -> {0, 1}
	MathEngine().VectorAbs( binarizedLabel, binarizedLabel, vectorSize );
	//учитываем только неигнорируемые сегменты
	MathEngine().VectorEltwiseMultiply( binarizedLabel, notIgnoredSegmentsMask, binarizedLabel, vectorSize );
	MathEngine().VectorSum( binarizedLabel, vectorSize, negativesTotal );

	negativesCount += to<int>( negativesTotal.GetValue() );
	positivesCount += to<int>( positivesTotal.GetValue() );
	truePositivesCount += to<int>( positiveCorrectTotal.GetValue() );
	trueNegativeCount += to<int>( negativesCorrectTotal.GetValue() );

	assert( positivesCount >= 0 );
	assert( negativesCount >= 0 );
	assert( trueNegativeCount <= negativesCount );
	assert( truePositivesCount <= positivesCount );`

@FedyuninV
Copy link
Contributor

Have you tested its performance? Have you met any networks where current version leads to a significant decrease in speed?

My arguments:

  1. It's hard to find an example where quality control takes significant time of training.
  2. The amount of data is too small -> the CUDA kernel grids will be too small -> CUDA kernel launches overhead may negate profits from the GPU parallelism.
  3. In the example above .GetValue() is called, that will lead to cudaMemcpy which is synchronous. As a result this example doesn't have an advantage in terms of CUDA device synchronization.

@MaratKhabibullin
Copy link
Author

Have you tested its performance?

Yeap.

Have you met any networks where current version leads to a significant decrease in speed?

yes, semantic segmentation net with large output.

@FedyuninV
Copy link
Contributor

FedyuninV commented Sep 29, 2021

Ok then, can you please provide inputBlobs sizes of this layer in the segmentation net?

@MaratKhabibullin
Copy link
Author

832 * 320 (H x W)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants